CS 61C Fall 1994 Final exam solutions 1. representation formats A. Binary to hex. Chop the binary number into groups of four bits, starting at the right. That gives 01 1100 1010 0101 1100 1010 Then convert each group of four bits to a hex digit, giving 0x 1 c a 5 c a so the correct answer is (C). B. Signed integers. We can see from the sign bits that all four numbers are negative, so let's negate each of them. (a) twos comp. 11110111: First invert all the bits (00001000) then add 1 to the result (00001001). So this is -9. (b) ones comp. 11110110: Just invert all the bits (0001001), also -9. (c) Sign magnitude 10001000: Just turn off the sign bit (00001000), so this one is -8, the different one. (d) Bias-127 01110110: Subtract from 127 (01111111), giving 00001001, so this one is -9 too. So again the correct answer is (C). C. States to bits. There are 47 graphic keys times 3 possible modifier states (neither, SHIFT, or CONTROL). 47 * 3 = 141 possible characters. So to represent that we need lg 141 = 8 bits. (7 bits would give us 128 possibilities, not enough. 8 bits gives 256 possibilities, more than enough.) Answer: 141, 8. P.S. This isn't quite the true story about the ASCII keyboard because not every graphic key can be used with CONTROL, and also the keyboard includes the spacebar and the delete key, each of which generates a special code but isn't shiftable. The other formatting keys (return, tab, etc.) duplicate codes that are available with CONTROL-something. The real story: 47 unshifted graphics 47 SHIFT-graphics 32 CONTROL-graphics 1 space 1 delete which adds up to exactly 128, which fits into 7 bits. P.P.S. You're thinking "but I can type control-space!" but actually when you do that, your keyboard is generating the code control-atsign (\0). P.P.P.S. Of course if you have a META key then you can apply that to any of the 128 ordinary ASCII codes, filling up the 8-bit byte. (D) MAL to hex. ADDIU is an I-format instruction: +------+-----+-----+----------------+ +opcode+ rs + rt + immediate + +------+-----+-----+----------------+ We look up addiu and find that its opcode is 001001 binary. In MAL, when only one register is given in an I-format instruction, it's both the source register and the target register; $29 is 11101 binary. The immediate value is -12, represented in twos complement. 12 = 0000...01100 (8+4) !12 = 1111...10011 (invert the bits) -12 = 1111...10100 (add 1) So putting these together we have +------+-----+-----+----------------+ +001001+11101+11101+1111111111110100+ +------+-----+-----+----------------+ To convert to hex, split this into groups of four bits: 0010 0111 1011 1101 1111 1111 1111 0100 0x 2 7 b d f f f 4 Answer: 0x27bdfff4 Scoring: One point each, all or nothing. 2. Design criteria. Many people lost points on this question for saying things like "the RISC philosophy is to make things as simple as possible and this isn't simple." Well, yes, but you could say that about almost anything! The point of the question was to see if you could focus on the *specific* bottlenecks in particular proposed instructions. A. Block Transfer. The main answer we were expecting here was "it would delay the pipeline to have multiple memory references in a single instruction cycle" and indeed many people said that. Some people, copying from an answer in the sample exam, said "it would require either multiple bus cycles, which would delay the pipeline, or widening the bus." Come on! The question from which you copied that answer was about a proposed 64-bit memory transfer instruction. This one is a proposed any-number-of-bits transfer! There is no way we could widen the bus to accommodate, for example, a megabyte memory-to-memory copy. People who did this got some credit for mentioning the pipeline but didn't leave us with a strong feeling of confidence that you understand what a bus is, or a pipeline, or a memory, or a transfer, or a computer. Several people suggested that this instruction would be too wide to fit in a word, because it would require three pointer-length operands. But this objection is easily overcome; the three operands could be registers containing the pointers and the length. This answer got half credit. Some people mentioned that an instruction that might take a very long time would leave the machine uninterruptable for that long time, which could lead to I/O data loss. This is a very good answer, showing some real thinking -- connecting the instruction semantics with the not-obviously- related question of interrupt handling. Of course this issue came up in the PDP-10 also, and here's how it was solved: If an interrupt is requested during a BLT instruction, the processor updates the pointers in the register operands to reflect how far the copying has gone, then remembers the address of the BLT itself, rather than the address of the next instruction, as the exception PC. After the interrupt is handled, the BLT continues. (That solution would run into difficulties in the MIPS architecture if the BLT happened to occupy the delay slot of a branch.) A few people suggested that block transfer is an unnecessary idea because arrays are assigned by reference (copying a pointer) rather than by value. That isn't a very good answer. For one thing, it concerns the design of the C language, not of the MIPS architecture. (Pascal passes arrays by value.) Second, even in C, structs are passed by value, so block copies are still useful. B. Jump to Subroutine. Here are the two main answers we were expecting: 1. A procedure called by JSR can't be called recursively. 2. Allowing write access to code pages would prevent the use of shared code. We gave half credit for the response "this would block the pipeline because two memory references would be required." In fact only one memory reference would be required; a branch or jump instruction does *not* fetch the next instruction, but merely puts a new address into the processor's PC register. So a JSR would combine a jump (no memory access required) with a store. On the other hand, the proposed jump-indirect instruction for returning from a procedure *would* block the pipe, not because of doing two loads (it only does one) but because the load would have to happen too early in the processing of the instruction. (An ordinary load actually doesn't complete until after the next instruction!) We gave full credit for saying that jump-indirect would block the pipe, even if you didn't fully understand why. One student mentioned that storing the return address near the procedure's code would require setting the dirty bit for that page when calling the procedure, which would increase the paging load on the system. Good answer. Several students suggested that using memory to save the return address instead of a register would require additional memory allocation. Pfui. First of all, the amount of memory required to hold return addresses is a tiny fraction of a program's memory requirement; second, in most cases the return address ends up in memory anyway, because $31 is saved on the stack. In the apple-pie-motherhood category was the answer "a RISC instruction shouldn't have two purposes." Consider that the MIPS architecture *does* include the instructions BLTZAL and BGEZAL, which combine test-and-branch with procedure calling. A couple of people said "this would violate stack discipline." Like the answer about passing arrays by reference earlier, this confuses a language design issue with an architecture issue. The MIPS JAL instruction doesn't have anything to do with using a stack or not using a stack, and neither does the PDP-10 JSR, really; the difference is between storing the old PC in a register and storing the old PC in memory. And in any case, "stack discipline" doesn't mean having a stack. Scheme, which violates stack discipline, is still generally implemented using a stack for most purposes. Logo is an example of a language that *does* obey stack discipline, but is generally implemented without the use of a hardware stack. (The environment is instead maintained using linked lists, as in the 61A metacircular evaluator.) Several people gave arguments that, if true, would also prove that JAL doesn't fit the MIPS architecture! For example, several people thought it problematic that (in the actual words of the problem) "The instruction stored the PC at that address..." because what has to be stored is the address of the next instruction, not the address of the JSR instruction. But of course exactly the same argument applies to the PC that JAL stores in $31! The stored PC is the address following the procedure call, and the same is true for JSR. A few people argued that the JSR mechanism wouldn't work for exception handling, for various reasons. This is irrelevant, because the MIPS doesn't use JAL for exception handling either! Since there is already a separate mechanism for exceptions, changing the ordinary procedure calling mechanism is independent of it. Actually this objection was particularly ironic because, in the PDP-10, JSR was used precisely for exception handling. The idea is that an interrupt might occur when no stack is set up, and no registers may be available for use by the exception handler. The MIPS solves this problem by building a special piece of hardware, the Exception PC register, for the sole purpose of keeping the old PC value. The PDP-10 solves the problem by saving the old PC in memory, making the assumption that the exception handler won't be reentrant. (It can be made reentrant by copying the old PC from the place where JSR put it to some newly-allocated space.) On the PDP-10, instead of jumping to a fixed address, an exception executed the instruction at a fixed address. Typically that instruction would be a JSR, which would save the PC without relying on any assumptions about registers. (The normal bread-and-butter procedure calling instruction on the PDP-10, by the way, was PUSHJ, Push and Jump. It took two operands, a register and a jump address. The register must have been previously set up to point to a stack, like $29. "Push" means to move the pointer by one word and store something there. So the stack was allocated a word at a time, rather than a frame at a time.) Scoring: Two points per answer (one for A, two for B). One point for certain half-okay answers as mentioned above. 3. C to MIPS. mapArray: addiu $sp, -20 # prologue sw $31, 0($sp) sw $16, 4($sp) sw $17, 8($sp) sw $18, 12($sp) sw $19, 16($sp) move $17, $4 # move args to saved registers move $18, $5 move $19, $6 move $16, $0 # i = 0 loop: bge $16, $19, fini # done if i >= dim sll $8, $16, 2 # i * 4 add $8, $18 # &b[i] lw $4, 0($8) # b[i] jal fun # fun(b[i]) sll $8, $16, 2 # i * 4 add $8, $17 # &a[i] sw $2, 0($8) # a[i] = fun(b[i]) addiu $16, 1 # i++ j loop fini: lw $31, 0($sp) # epilogue lw $16, 4($sp) lw $17, 8($sp) lw $18, 12($sp) lw $19, 16($sp) addiu $sp, 20 jr $31 Of course this isn't the only possible solution. One common variant is to store the arguments (a, b, dim) on the stack instead of in saved registers. Another is to use an additional saved register to maintain the value of i*4 instead of multiplying each time. And of course many people used MUL instead of SLL to do the multiplying, although that's a lot slower. Scoring: 6 correct 4-5 has the idea 2-3 has an idea 0-1 other Minimum for "has an idea": No inventing nonexistent addressing modes such as LW $4, $8($18). Also, there must be an overall loop structure. (It's hard to list all the ways you could not have an idea, but those are the main ones.) Minimum for "has the idea": * attempt to follow register-saving conventions (maybe you miss one). * fun is given an argument in $4 and returns a value in $2. * the arrays are in memory (there's a LW before, and a SW after, the call to fun). * indexing is done more or less right (you don't use a[0] every time). Typical 4-point solution: elements of A and B are one byte instead of four. Typical 3-point solution: no LW instruction. 4. Count bits. numBitsSet: move $2, $0 # this will be the count loop: beqz $4, fini # no bits left to count andi $9, $4, 1 # test low bit beqz $9, next # branch if zero addi $2, 1 # one, so count it next: srl $4, 1 # shift off low bit j loop fini: jr $31 Variants are to test the high bit (by testing whether $4 is negative!) and shift bits off to the left; to shift the tested bit instead of shifting the argument; to notice that the result of the ANDI above is either 0 or 1 and just add it to the count instead of branching. By moving the end test to the bottom of the loop you can save an instruction or two: numBitsSet: move $2, $0 loop: andi $9, $4, 1 add $2, $9 srl $4, 1 bnez $4, loop jr $31 Can't get much more elegant than that. Many students made this much more complicated than necessary by saving things on the stack. (Note that question 3's solution would have been much shorter if it hadn't invoked another procedure FUN.) For your amusement, the following much faster solution, because it doesn't involve a loop, was pointed out to me by one of the students: int numBitsSet(unsigned n) { n = (n & 0x55555555) + ((n >> 1) & 0x55555555); n = (n & 0x33333333) + ((n >> 2) & 0x33333333); n = (n & 0x0f0f0f0f) + ((n >> 4) & 0x0f0f0f0f); n = (n & 0x00ff00ff) + ((n >> 8) & 0x00ff00ff); return (n & 0x0000ffff) + ((n >> 16) & 0x0000ffff); } Why this works is left as an exercise. :-) A common subtle mistake was to use division instead of shifting. The problem with this is that for negative numbers you end up extending the sign bit, so you count more ones than were in the original number. Scoring: 5 correct 3-4 has the idea 1-2 has an idea 0 other Minimum for "has the idea" is masking off all but one bit, and looping for all the bits. 5. Memory organization A. Find 0x7f55a356 in the cache. Since the cache is 16 bytes wide, the lowest 4 bits (4 = lg 16) of the address are the offset within a block. That's the last hex digit, the 6. So we're looking for the cache block that would contain block number 0x7f55a35. The cache contains 4K bytes, and one block is 16 bytes, so there are 4K/16 = (2^12)/(2^4) = 2^8 blocks in the cache. Since the cache is direct mapped, every block is a set of one, and the address uniquely determines which block (i.e., which set) to use. We find out by looking at the lowest 8 bits (because there are 2^8 blocks) of the block number, which is 0x35. What's left, 0x7f55a, is the tag. So (B) is the correct answer. B. How many sets? A block is 32 bytes; there are five blocks to a set. So the size of a set is 32*5=160 bytes. The entire cache size is 80K = 80*(2^10), so the number of sets is 80K/160 = 2^9 = 512 bytes. C. TLB mapping. A page is 4Kb = 2^12 bytes. That means that the bottom 12 bits of the address (3 hex digits) are the offset, and the rest is the virtual page number. For this virtual address, the offset is 0x678 and the virtual page number is 0x12345. We look up the latter in the TLB and find it in slot number 2, corresponding to physical page number 0x200. Combining this with the offset, the physical address is 0x200678. D. What can't be paged? The answer is (B); if the disk I/O handler were paged out, then it could never be brought back into memory, and neither could anything else. It's okay to page out the page tables; they are in the operating system's virtual address space, and the same mechanism that pages other stuff in and out can deal with missing page table entries. (Of course there has to be access to SOME memory, so maybe part of the system's own page table will be in the unmapped portion of memory.) It's okay to page out the system call handler. System calls are instructions executed by a user process; if the handler needed for a certain system call isn't available, it could be paged in later, just as anything else that a user process needs can be. And of course if user programs couldn't be paged out, that would defeat the whole purpose of paging! Scoring: One point each. 6. Cache probing. What is the cache size? All of the times for an array size of 256 bytes are small (about 100 ns), whereas for an array of 512 bytes most of the times suddenly are around 900 ns. So probably the cache size is 256 bytes. What is the block size? For array sizes at or above 512, a stride of 4 gives access times around 450 ns, whereas larger strides take about 900 ns. So a stride of 8 moves to the next block, whereas a stride of 4 is within the same block half the time. Therefore the block size is 8 bytes. What is the allocation policy? For an array size of 512, strides of 128 and 256 lead to lower access times (around 300 ns). For an array size of 1024, a stride of 256 leads to a similar low time. In other words, if the stride is 1/2 or 1/4 of the array size, then not every reference leads to a cache miss, whereas if the stride is 1/8 or less of the array size, every reference misses. This suggests a 4-way set associative cache. Scoring: Two points each. All or nothing, except that saying "set associative" (NOT fully associative!) with the wrong value of N got one point. 7. Clock interrupts. intrp: lui $26, 0xffff lw $27, 0x10($26) # get control register andi $27, 1 # just ready bit beqz $27, nope # not ready, do nothing .set noat move $26, $1 # save $1 in case needed for LW .set at lw $27, SleepCount addiu $27, -1 # ready, decrease count sw $27, SleepCount .set noat move $1, $26 .set at bgtz $27, nope # counted down to zero? lui $26, 0xffff sw $0, 0x10($26) # yes, turn off timer nope: mfc0 $26, $14 rfe jr $26 We didn't take off for this, but it's actually important to use ADDIU and not ADDI, because if the count happened to be 0x8000000 then subtracting 1 would overflow, which would cause another exception, which would get us in trouble since this handler is not reentrant. Most people saved $1 on the stack. Just to show it can be done, here's an alternative that doesn't assume $sp is set up. Since we don't need to save anything but $1, we can use $26 for that. (The LW and SW to SleepCount are the only instructions that might require the use of $1.) We didn't take off if you assumed that SleepCount was reachable without using $1, though. (But we did take off for any other use of $1 without saving it.) Scoring: 5 OK, 3-4 the idea, 1-2 an idea, 0 other. "Having the idea" requires all of the following: Saving registers correctly, using RFE, checking the ready bit, updating the count, and not making any assumptions about the user's registers. (In particular, it's wrong to assume that the timer will interrupt only from inside the sleep procedure given in the question!!!) Typical 3-point solution: Forgetting to turn off the interrupt enable bit when the count reaches zero. Typical 0-point solution: Looping in the exception handler. The whole idea of interrupts is that they happen quickly, without waiting for a device to become ready! 8. Exception handling. A. When an exception occurs, the MIPS processor saves the old PC (d), switches to kernel mode and disables interrupts (c), and jumps to location 0x80000080 (b). It does *not* read the Cause register (a). Perhaps the exception handler software will read the Cause register, but the processor doesn't do that on its own. (The processor will *set* the Cause register to a value indicating the reason for the exception.) B. The main advantage of interrupts is (a) allowing the processor to do other stuff while waiting for I/O. The others may be advantages of having an operating system, but not specifically of interrupts. C. What are the differences between kernel mode and user mode? There are two differences: 1. Kernel mode allows access to virtual addresses >= 0x80000000. 2. Kernel mode allows the use of the coprocessor instructions. (Alternatively: allows access to coprocessor registers.) Scoring: One point for A, one for B, and one for each of the two differences in C. Note: "I/O registers" counts as the same answer as #1, since the I/O device registers are in kernel virtual memory, so if you said "access to high virtual addresses and access to I/O registers" you only got one point. A somewhat common wrong answer was "kernel mode can use $26 and $27." No! It is a software convention that those registers are reserved for the exception handler, but that has nothing to do with kernel MODE and user MODE in the hardware. Lots of people explained why having two modes was a good idea, a question asked in an earlier exam. We didn't take off for that, as long as you also answered *this* question.