CS 61C		Spring 1995		Final exam solutions

1.  Translate MAL to TAL.

a.  LW $8, 0xffff000c

	LUI $1, 0xffff			LUI $8, 0xffff
	LW  $1, 0xc($1)		or	LW  $8, 0xc($8)

Most common mistake: ORI instead of LW, which would implement the LA
instruction instead of LW.


b.  BLT $16, $4, LABEL

	SLT  $1, $16, $4		SUB  $1, $16, $4
	BNE  $1, $0, LABEL	or	BLTZ $1, LABEL

Acceptable variants: BGTZ instead of BNE, SUBU instead of SUB.

Most common mistake: using a register other than $1 as the temporary
register.  Next most common: using BNEZ, which doesn't exist in TAL.


c.  ADD $8, 5

	ADDI $8, $8, 5

Hardly anyone missed this part.

Scoring: one point each.


2.  5Gb disk.

The most likely reason for this crash is that the operating system was
written to use a data structure in which a disk byte address is kept in
a 32-bit word.  That would have been adequate until fairly recently, but
not beyond 4Gb (2 to the 32nd power).

(This is based on the assumption that the operating system would never
have been released to users unless at some point it could format some
disk!  There has to be something special about this particular disk to
make it fail.)

Scoring:

3: as above.

2: recognizes the importance of 2^32, but implies that *memory* addresses
   are involved, or that the problem is a hardware limitation.

1: misses the point, but could possibly be true, e.g., the disk being
   formatted contained system data, swap space, or whatever.

0: implies that no disk could ever be formatted, e.g., not enough room
   in main memory to hold the contents of the disk.


3.  Why no EXCH in MIPS?

Two very good answers that several students gave:

	* A load and store cycle in the same instruction might block
	  the pipeline, since its circuitry was designed for only one
	  memory data reference per instruction.

	* Since this instruction changes values in two places (a register
	  and a memory location), it would be hard to restart after an
	  exception if half-completed, with one target changed but the
	  other still with its old value.

We gave two points to "an additional temporary register would be needed"
(to implement the algorithm TEMP <- MEMORY, MEMORY <- REGISTER,
REGISTER <- TEMP).  This is true, but any processor is full of buffer
registers and perhaps one of those is already usable for this purpose.

Scoring:

3: as above
2: sort of as above, but vague
1: generic apple-pie-and-motherhood answers like "it's too complicated"
0: even worse

By the way, an interesting side note to the hard-to-restart argument is
that historically, many architecture designers have taken pains to have
at least one uninterruptable ("atomic") load-and-store instruction
because that simplifies the problem of allowing a processor to reserve
some system resource in a multiprocessor environment.  Suppose two
processors both want to use the system's widget at the same time.  There
is a word in memory that will contain, say, -1 if the widget is free,
or the processor number of the processor that's using it.  So each
processor would say

	LI $8, my-processor-number
	EXCH $8, widget-address

and then if the value in $8 is -1, that processor knows it has permission
to use the widget.  It is possible to do this sort of resource allocation
without an atomic load-and-store instruction, but it's trickier.

4.  Which C procedure is correct and why?

int a[10];

int *proc1() {
    int b[10];
    ...
    return b;
}

int *proc2() {
    ...
    return a;
}

	Answer: proc1 is wrong, proc2 is correct.  The reason is that
	proc1 is attempting to return a pointer to a locally stack-allocated
	array, i.e., a pointer to data that won't exist once the procedure
	returns.  But proc2 returns a pointer to a static-allocated array,
	which does still exist after the procedure returns.

Scoring: 1 point for picking the correct procedure; then 2 more points for a
good explanation, 1 for a so-so explanation (e.g., one that implies that
it would be an error to return *any* local variable, not just a *pointer*),
or 0 for a wrong explanation.


5.  Cache/VM geometry.

Parts a-c are about the cache.  The cache size is 8Kb, which is 2^13.
The cache width is 32 words, or 128 bytes, which is 2^7.  Therefore
there are (2^13)/(2^7) = 2^6 = 64 cache rows.  Since the cache is
4-way set associative, these rows are grouped as 16 sets of 4 rows
per set.  So a memory address is

+---------------------+----+-------+
|         21          |  4 |   7   |
|        tag          | set| offset|
+---------------------+----+-------+

(a) block offset is 7 bits, because a cache row is 2^7 bytes wide.

(b) set index is 4 bits, because there are 16 = 2^4 sets.

(c) What's left, 32 - 7 - 4, is the tag, 21 bits.


Parts d-f are about virtual memory.  The page size is 4K bytes = 2^12.
So for VM purposes a 32-bit memory address (both physical and virtual
are 32 bits) is

+--------------------+------------+
|        20          |      12    |
|    page number     |    offset  |
+--------------------+------------+

(d) page offset is 12 bits, because a page holds 2^12 bytes.

(e) and (f) page numbers are the remaining 32-12 = 20 bits.

Scoring:

We wanted not to penalize people repeatedly for a wrong answer
correctly derived from a previous wrong answer.  Also, some people
interpreted "block offset" as measured in words rather than bytes,
with an extra 2-bit field for the byte offset within a word.  So
we used this complicated formula:

(a) one point for 7; one point for 5 provided that (c) is correct.
(b) one point for 4.
(c) one point for 32-(a+b); if a=5, one point for 25-b.
(d) one point for 12.
(e) one point for 32-d.
(f) one point for 32-d or for 20.


6.  Why cache in hardware, VM largely in software?

There are many reasons for this, and we accepted a wide range of possible
three-point answers.  For example:

	* Virtual memory uses disk, which has a large and unpredictable
	  latency delay, which means that the processor can't wait for a
	  VM miss, but must instead switch to a different process, which
	  requires software support.

	* Since the miss penalty is not so severe for a cache miss as
	  for a page fault, the cache can use random replacement, which
	  is relatively easy to do in hardware; by contrast, VM generally
	  uses an LRU algorithm, which requires complicated bookkeeping
	  best done with software support.

	* Since the backup store for VM is the disk, the page fault
	  handler must understand the file system structure, to know
	  which part of the disk to use; this structure is not part
	  of the disk hardware but is imposed by the operating system.
	  (That's why PC diskettes and Mac diskettes are incompatible
	  even though they are physically the same.)

	* If a cache miss ran some software, the possibility arises that
	  part of that software might not be in the cache, which would
	  create a recursive cache miss.  The cache miss handler software
	  would have to be permanently locked into the cache.

A typical two-point answer would be one that didn't show a clear
understanding of the difference between a hit and a miss, e.g., making
an argument that handling each cache hit in software would defeat the
purpose of caching.  The same is true for VM: TLB hits are handled by
the hardware!  It's only on a miss that the question of hardware vs.
software control is relevant.

A typical one-point answer is one that's true but not clearly relevant,
e.g., "the cache is smaller."  A zero-point answer isn't even true.

Since many people gave more than one answer, our policy was to credit
your best answer, but to take off one point if any of your answers was
downright incorrect.


7.  How to turn off the transmitter.

If we turned off the IE bit in the system status register, then *all*
interrupts would be disabled, including receiver interrupts, so no
further characters would be read from the keyboard.

Scoring: There wasn't much in the way of partial credit for this
question.  We gave one point for "there will be continuous interrupts"
because we guessed that you meant that the RFE would undo turning off
the IE bit.  But if you interpret the question that way, the IE bit is
*already* off!  What we had in mind was leaving IE off on return from
the exception handler.


8.  Too-small buffers.

(a) If the keyboard buffer is too small, some input characters will
be lost.

(b) If the display buffer is too small, programs will run more slowly
because they'll be blocked much of the time, but no data will be lost.

Some people responded to (b) saying that because the process will block,
the input buffer won't be emptied, and so eventually that will fill up
too, and then data *will* be lost.  This is a good answer too.

Scoring: two points each.


9.  An interrupt handler should never:

(d) return a value in $2.

Since interrupts are asynchronous, there is no "caller" procedure to
accept a return value.  An interrupt handler must not modify any
registers other than $26 and $27.

Note:  If the question had said "exception handler" rather than
"interrupt handler" then there would be one special case, namely a
system call exception, which is in effect a procedure call from a
user program to the operating system.  But a system call is not
an interrupt.

Scoring:  2 points if correct.  1 point if two answers checked, one of
which was correct.  (No points if more than two checked.)


10.  strncmp in MAL.

Since there are no procedure calls within this procedure, we can use
temp registers for everything and need not allocate a stack frame.
We put c1 in $8, c2 in $9.

strncmp:
	addi $6, $6, -1		# --n
	bltz $6, zero		# while's end test
	lb   $8, 0($4)		# c1 = *cs++
	addi $4, $4, 1
	lb   $9, 0($5)		# c2 = *ct++
	addi $5, $5, 1
	bne  $8, $9, nonzero	# if c1 != c2 ...
	beqz $8, zero		# if c1 == '\0' ...
	b strncmp		# loop

zero:	add  $2, $0, $0		# return 0
	jr   $31

nonzero:
	sub $2, $8, $9		# return c1-c2
	jr $31


Scoring:  One point for each of the following:

	--n
	looping correctly
	*p++
	return correct values in $2
	everything else

The most common errors were:

	LW instead of LB

	continuing the loop instead of returning for the RETURNs
	inside the loop

	using saved registers, therefore allocating stack space,
	but only one byte for each saved register because it will
	be used for a char variable.  (You still have to save the
	entire previous value!)


11.  Linker stuff

(a)

r_vaddr		r_type		r_extern	text or data?

0		R_JMPADDR	0		text
0x14		R_JMPADDR	1		text
0x24		R_JMPADDR	0		text

Scoring:

2 points if exactly as shown

1 point: as shown plus at most three spurious entries
1 point: two correct rows, no extras
1 point: three correct columns, no extras

0: anything else

Common errors:

Entries for the branches.
An entry for each label rather than for the instruction that refers to it.
Something other than a number in r_vaddr.


(b)

LOOP:	JAL  GETCHAR		0c000200
	BLTZ $2, SKIPPED	04400003
	LI   $8, ' '		20080020 or 34080020 or 24080020
	BNE  $2, $8, SKIPPED	14480001 or 15020001
	J    LOOP		0800004b

The alternatives for the LI instruction are ADDI, ORI, and ADDIU.
The alternate for BNE is because we accepted an answer with the
two register fields in the wrong order, since it doesn't matter
for BNE.

The most common problems were forgetting to divide the jump addresses
by four and forgetting that branch addresses are PC-relative.  Another
one was using LUI as the translation of LI.  (If the immediate value
were more than 16 bits wide, then the LI would have to be *two* machine
instructions, LUI followed by ORI.  But certainly LUI alone makes no
sense.)

Scoring:

3  correct
2  a couple of glitches
1  a few correct halfwords
0  not even that