CS 61C Spring 1995 Midterm 2 solutions 1. Stack implementation. void push(int* &stack, int x) { *(stack++) = x; } int pop(int* &stack) { return *(--stack); } (a) Which best describes the stack management: ii. The stack is growing up in memory, and the stack pointer points to the next empty array element. Because the code for push() says stack++ and not ++stack, we know that the stack pointer is incremented *after* it is dereferenced. In other words, the new value x is stored in the word that stack points to, and then stack is incremented to point to the next free element. And of course the stack grows upward, not downward, because it's push() that increments the pointer and pop() that decrements it! Scoring: 2 points if correct. 1 point for a half-correct answer (i or iv). (b) Write push() in MAL: push: lw \$8, 0(\$4) # \$8 is the actual stack pointer sw \$5, 0(\$8) # *stack = x addi \$8, \$8, 4 # stack++ sw \$8, 0(\$4) # update stack pointer in memory jr \$31 Probably the subtlest part of this problem is the first instruction. The variable stack is a *reference* parameter, which means that what is actually passed in \$4 is a *pointer to* stack (which is itself a pointer to an integer). That's why, in the equivalent C version, it's shown as a pointer to a pointer. So there are two levels of dereferencing: First we dereference \$4 to get the actual stack pointer (into \$8), then we dereference \$8 to get to the desired stack element. Scoring for typical errors: 4 OK 3 preincrement, i.e., *++stack instead of *stack++ increment by 1 instead of by 4 no jr instruction {**stack = x; stack++;} (increment \$4 instead of \$8) 2 (**stack)++; (increment the stack element instead of the pointer) args not in \$4 and \$5 1 only one SW instruction 0 even worse 2. LRU features (a) What hardware supports LRU? The Reference bit (or Use bit) in the TLB or page table entry. (b) How can an OS support LRU without it? Periodically turn off all the Valid bits, and use the resulting page faults to control a software-maintained reference bit. Alternative answer to (b), less general but accepted: On the MIPS, where there is no page table hardware, the OS could assume that any entry that is about to be flushed from the TLB is fairly recently used, and turn on a software reference bit in the software-maintained page table. (In other words, pretend that *every* valid TLB entry has its reference bit set.) Scoring: 2 points each. Part credit for answers that showed an understanding of the key fact that we mustn't slow down every memory reference! No part credit for answers without this understanding, e.g., "have the OS record a timestamp for every memory reference." (The *hardware* could possibly record a timestamp, but that doesn't answer the question about what the OS can do without hardware help.) No credit for an answer that just restates the question, e.g., "The OS must find a way to tell which pages have been used recently." 3. Cache performance. Cache A: Two blocks, one word each Cache B: One block of two words (a) a[0]++; a[1]++; a[2]++; a[3]++; Cache A: a[0] misses, into first block; a[1] misses, into second block; a[2] misses, into first block; a[3] misses, into second block. Cache B: a[0] misses, loads a[0] and a[1]; a[1] hits; a[2] misses, loads a[2] and a[3]; a[3] hits. So cache B has a better hit rate (2 hits) than cache A (0 hits). (b) a[1]++; a[3]++; a[1]++; a[3]++; Cache A: a[1] misses, into second block; a[3] misses, into second block; a[1] misses, into second block; a[3] misses, into second block. Cache B: a[1] misses, loads a[0] and a[1]; a[3] misses, loads a[2] and a[3]; a[1] misses, loads a[0] and a[1]; a[3] misses, loads a[2] and a[3]. So the two caches have the same hit rate (0 hits). (c) a[1]++; a[2]++; a[1]++; a[2]++; Cache A: a[1] misses, into second block; a[2] misses, into first block; a[1] hits; a[2] hits. Cache B: a[1] misses, loads a[0] and a[1]; a[2] misses, loads a[2] and a[3]; a[1] misses, loads a[0] and a[1]; a[2] misses, loads a[2] and a[3]; So cache A has a better hit rate (2 hits) than cache B (0 hits). So the two answers are C and A. Scoring: 2 points each. 4. Finding an address. A page is 2^12 bytes, so the virtual address is divided into a 20-bit virtual page number and a 12-bit offset. So the address 0x40fa25d4 is VPN: 40fa2 offset: 5d4 We look up the VPN in the TLB and find it in slot 6: VPN 40fa2, physical page 076 Therefore the desired physical address is 0x0765d4. Now we are done with the TLB and need to find this address in the cache. The cache width is 2^5 bytes, so there are five bits of cache offset. The total cache size is 2^9 bytes, so there are 2^4 = 16 cache blocks. The cache is 2-way set associative, so these 16 blocks are organized as 8 (= 2^3) sets of 2 blocks each. So there are three bits of set number. Physical address = 0x765d4 = 0111 0110 0101 1101 0100 binary = 0111 0110 0101 110 10100 --- ----- set # offset so we are looking for an offset of 0x14 in a block of set # 6. Set #6 has two blocks, whose keys are a1c5 and 0765. The second of these matches the leftmost bits of the physical address, so we use that block (block # 0xd), and at offset 0x14 we find the data, 0x46. Scoring: 4 found data 0x46 3 found cache block 0xd, wrong offset 2 computed physical address 0x765d4 1 found TLB slot 6 0 none of the above 5. Translate C/C++ to MIPS: int modemod4(int a[], int len) { int counts[4]; zero(counts); while (--len >= 0) { counts[a[len] & 03]++; } return biggest(counts); } modemod4: addi \$sp, \$sp, -28 # allocate frame, including counts array sw \$31, 16(\$sp) # save stuff sw \$4, 20(\$sp) sw \$5, 24(\$sp) add \$4, \$sp, \$0 # \$4 gets &counts[0] jal zero lw \$4, 20(\$sp) # restore args after procedure call lw \$5, 24(\$sp) loop: addi \$5, \$5, -1 # --len, so decrement before test bltz \$5, fini sll \$8, \$5, 2 # 4 * len add \$8, \$8, \$4 # &a[len] lw \$9, 0(\$8) # **** a[len] **** andi \$9, \$9, 3 # a[len] & 03 sll \$8, \$9, 2 # 4 * (a[len] & 03) add \$8, \$8, \$sp # &counts[a[len] & 03] lw \$9, 0(\$8) # **** counts[a[len] & 03] **** addi \$9, \$9, 1 sw \$9, 0(\$8) # **** counts[a[len] & 03]++ **** b loop fini: add \$4, \$sp, \$0 # \$4 gets &counts[0] jal biggest lw \$31, 16(\$sp) # unwind everything addi \$sp, \$sp, 28 jr \$31 Scoring: We considered that the core of this problem is the three instructions marked with **** above: two loads and a store. So... 6 correct 5 minor errors 4 load, load, store (to the correct memory addresses) with perhaps one address calculated incorrectly 3 load, load, store but two or more errors in that part 2 no load, load, store, but something looking like a while loop inside a procedure 1 something weirder than that 0 even worse Many people were confused about the allocation of the arrays. The array COUNTS is a local variable in this procedure, so it is allocated on the stack, all 16 bytes of it -- not just a pointer. The array A, however, is an argument to the procedure, and arrays in C/C++ are passed by reference, so what is in \$4 is a pointer to the array A (which is allocated by some other part of the program). Similarly, when we pass COUNTS as an argument to the procedures ZERO and BIGGEST, we pass it by reference, i.e., we put its address in \$4, which is what the add \$4, \$sp, \$0 does, because the array is allocated at the bottom of our stack frame, i.e., at 0(\$sp). So that add instruction is equivalent to la \$4, 0(\$sp) Some people were confused by the foo&03 business. It's equivalent to foo%4, i.e., the remainder on division by 4. (This works only because 4 is a power of two! You couldn't get foo%5, for example, with foo&4.) Anding is faster than remaindering, which is why we wrote it this way. A common mistake was to translate len-- instead of --len, i.e., to test len against zero before decrementing it.