CS 61C Spring 1995 Final exam solutions 1. Translate MAL to TAL. a. LW $8, 0xffff000c LUI $1, 0xffff LUI $8, 0xffff LW $1, 0xc($1) or LW $8, 0xc($8) Most common mistake: ORI instead of LW, which would implement the LA instruction instead of LW. b. BLT $16, $4, LABEL SLT $1, $16, $4 SUB $1, $16, $4 BNE $1, $0, LABEL or BLTZ $1, LABEL Acceptable variants: BGTZ instead of BNE, SUBU instead of SUB. Most common mistake: using a register other than $1 as the temporary register. Next most common: using BNEZ, which doesn't exist in TAL. c. ADD $8, 5 ADDI $8, $8, 5 Hardly anyone missed this part. Scoring: one point each. 2. 5Gb disk. The most likely reason for this crash is that the operating system was written to use a data structure in which a disk byte address is kept in a 32-bit word. That would have been adequate until fairly recently, but not beyond 4Gb (2 to the 32nd power). (This is based on the assumption that the operating system would never have been released to users unless at some point it could format some disk! There has to be something special about this particular disk to make it fail.) Scoring: 3: as above. 2: recognizes the importance of 2^32, but implies that *memory* addresses are involved, or that the problem is a hardware limitation. 1: misses the point, but could possibly be true, e.g., the disk being formatted contained system data, swap space, or whatever. 0: implies that no disk could ever be formatted, e.g., not enough room in main memory to hold the contents of the disk. 3. Why no EXCH in MIPS? Two very good answers that several students gave: * A load and store cycle in the same instruction might block the pipeline, since its circuitry was designed for only one memory data reference per instruction. * Since this instruction changes values in two places (a register and a memory location), it would be hard to restart after an exception if half-completed, with one target changed but the other still with its old value. We gave two points to "an additional temporary register would be needed" (to implement the algorithm TEMP <- MEMORY, MEMORY <- REGISTER, REGISTER <- TEMP). This is true, but any processor is full of buffer registers and perhaps one of those is already usable for this purpose. Scoring: 3: as above 2: sort of as above, but vague 1: generic apple-pie-and-motherhood answers like "it's too complicated" 0: even worse By the way, an interesting side note to the hard-to-restart argument is that historically, many architecture designers have taken pains to have at least one uninterruptable ("atomic") load-and-store instruction because that simplifies the problem of allowing a processor to reserve some system resource in a multiprocessor environment. Suppose two processors both want to use the system's widget at the same time. There is a word in memory that will contain, say, -1 if the widget is free, or the processor number of the processor that's using it. So each processor would say LI $8, my-processor-number EXCH $8, widget-address and then if the value in $8 is -1, that processor knows it has permission to use the widget. It is possible to do this sort of resource allocation without an atomic load-and-store instruction, but it's trickier. 4. Which C procedure is correct and why? int a[10]; int *proc1() { int b[10]; ... return b; } int *proc2() { ... return a; } Answer: proc1 is wrong, proc2 is correct. The reason is that proc1 is attempting to return a pointer to a locally stack-allocated array, i.e., a pointer to data that won't exist once the procedure returns. But proc2 returns a pointer to a static-allocated array, which does still exist after the procedure returns. Scoring: 1 point for picking the correct procedure; then 2 more points for a good explanation, 1 for a so-so explanation (e.g., one that implies that it would be an error to return *any* local variable, not just a *pointer*), or 0 for a wrong explanation. 5. Cache/VM geometry. Parts a-c are about the cache. The cache size is 8Kb, which is 2^13. The cache width is 32 words, or 128 bytes, which is 2^7. Therefore there are (2^13)/(2^7) = 2^6 = 64 cache rows. Since the cache is 4-way set associative, these rows are grouped as 16 sets of 4 rows per set. So a memory address is +---------------------+----+-------+ | 21 | 4 | 7 | | tag | set| offset| +---------------------+----+-------+ (a) block offset is 7 bits, because a cache row is 2^7 bytes wide. (b) set index is 4 bits, because there are 16 = 2^4 sets. (c) What's left, 32 - 7 - 4, is the tag, 21 bits. Parts d-f are about virtual memory. The page size is 4K bytes = 2^12. So for VM purposes a 32-bit memory address (both physical and virtual are 32 bits) is +--------------------+------------+ | 20 | 12 | | page number | offset | +--------------------+------------+ (d) page offset is 12 bits, because a page holds 2^12 bytes. (e) and (f) page numbers are the remaining 32-12 = 20 bits. Scoring: We wanted not to penalize people repeatedly for a wrong answer correctly derived from a previous wrong answer. Also, some people interpreted "block offset" as measured in words rather than bytes, with an extra 2-bit field for the byte offset within a word. So we used this complicated formula: (a) one point for 7; one point for 5 provided that (c) is correct. (b) one point for 4. (c) one point for 32-(a+b); if a=5, one point for 25-b. (d) one point for 12. (e) one point for 32-d. (f) one point for 32-d or for 20. 6. Why cache in hardware, VM largely in software? There are many reasons for this, and we accepted a wide range of possible three-point answers. For example: * Virtual memory uses disk, which has a large and unpredictable latency delay, which means that the processor can't wait for a VM miss, but must instead switch to a different process, which requires software support. * Since the miss penalty is not so severe for a cache miss as for a page fault, the cache can use random replacement, which is relatively easy to do in hardware; by contrast, VM generally uses an LRU algorithm, which requires complicated bookkeeping best done with software support. * Since the backup store for VM is the disk, the page fault handler must understand the file system structure, to know which part of the disk to use; this structure is not part of the disk hardware but is imposed by the operating system. (That's why PC diskettes and Mac diskettes are incompatible even though they are physically the same.) * If a cache miss ran some software, the possibility arises that part of that software might not be in the cache, which would create a recursive cache miss. The cache miss handler software would have to be permanently locked into the cache. A typical two-point answer would be one that didn't show a clear understanding of the difference between a hit and a miss, e.g., making an argument that handling each cache hit in software would defeat the purpose of caching. The same is true for VM: TLB hits are handled by the hardware! It's only on a miss that the question of hardware vs. software control is relevant. A typical one-point answer is one that's true but not clearly relevant, e.g., "the cache is smaller." A zero-point answer isn't even true. Since many people gave more than one answer, our policy was to credit your best answer, but to take off one point if any of your answers was downright incorrect. 7. How to turn off the transmitter. If we turned off the IE bit in the system status register, then *all* interrupts would be disabled, including receiver interrupts, so no further characters would be read from the keyboard. Scoring: There wasn't much in the way of partial credit for this question. We gave one point for "there will be continuous interrupts" because we guessed that you meant that the RFE would undo turning off the IE bit. But if you interpret the question that way, the IE bit is *already* off! What we had in mind was leaving IE off on return from the exception handler. 8. Too-small buffers. (a) If the keyboard buffer is too small, some input characters will be lost. (b) If the display buffer is too small, programs will run more slowly because they'll be blocked much of the time, but no data will be lost. Some people responded to (b) saying that because the process will block, the input buffer won't be emptied, and so eventually that will fill up too, and then data *will* be lost. This is a good answer too. Scoring: two points each. 9. An interrupt handler should never: (d) return a value in $2. Since interrupts are asynchronous, there is no "caller" procedure to accept a return value. An interrupt handler must not modify any registers other than $26 and $27. Note: If the question had said "exception handler" rather than "interrupt handler" then there would be one special case, namely a system call exception, which is in effect a procedure call from a user program to the operating system. But a system call is not an interrupt. Scoring: 2 points if correct. 1 point if two answers checked, one of which was correct. (No points if more than two checked.) 10. strncmp in MAL. Since there are no procedure calls within this procedure, we can use temp registers for everything and need not allocate a stack frame. We put c1 in $8, c2 in $9. strncmp: addi $6, $6, -1 # --n bltz $6, zero # while's end test lb $8, 0($4) # c1 = *cs++ addi $4, $4, 1 lb $9, 0($5) # c2 = *ct++ addi $5, $5, 1 bne $8, $9, nonzero # if c1 != c2 ... beqz $8, zero # if c1 == '\0' ... b strncmp # loop zero: add $2, $0, $0 # return 0 jr $31 nonzero: sub $2, $8, $9 # return c1-c2 jr $31 Scoring: One point for each of the following: --n looping correctly *p++ return correct values in $2 everything else The most common errors were: LW instead of LB continuing the loop instead of returning for the RETURNs inside the loop using saved registers, therefore allocating stack space, but only one byte for each saved register because it will be used for a char variable. (You still have to save the entire previous value!) 11. Linker stuff (a) r_vaddr r_type r_extern text or data? 0 R_JMPADDR 0 text 0x14 R_JMPADDR 1 text 0x24 R_JMPADDR 0 text Scoring: 2 points if exactly as shown 1 point: as shown plus at most three spurious entries 1 point: two correct rows, no extras 1 point: three correct columns, no extras 0: anything else Common errors: Entries for the branches. An entry for each label rather than for the instruction that refers to it. Something other than a number in r_vaddr. (b) LOOP: JAL GETCHAR 0c000200 BLTZ $2, SKIPPED 04400003 LI $8, ' ' 20080020 or 34080020 or 24080020 BNE $2, $8, SKIPPED 14480001 or 15020001 J LOOP 0800004b The alternatives for the LI instruction are ADDI, ORI, and ADDIU. The alternate for BNE is because we accepted an answer with the two register fields in the wrong order, since it doesn't matter for BNE. The most common problems were forgetting to divide the jump addresses by four and forgetting that branch addresses are PC-relative. Another one was using LUI as the translation of LI. (If the immediate value were more than 16 bits wide, then the LI would have to be *two* machine instructions, LUI followed by ORI. But certainly LUI alone makes no sense.) Scoring: 3 correct 2 a couple of glitches 1 a few correct halfwords 0 not even that