CS 61C		Spring 1995		Midterm 2 solutions

1.  Stack implementation.

void push(int* &stack, int x) {
    *(stack++) = x;
}

int pop(int* &stack) {
    return *(--stack);
}

(a) Which best describes the stack management:

	ii.  The stack is growing up in memory, and the stack pointer points
	to the next empty array element.

Because the code for push() says stack++ and not ++stack, we know that
the stack pointer is incremented *after* it is dereferenced.  In other
words, the new value x is stored in the word that stack points to, and
then stack is incremented to point to the next free element.

And of course the stack grows upward, not downward, because it's push()
that increments the pointer and pop() that decrements it!

Scoring:  2 points if correct.  1 point for a half-correct answer (i or iv).


(b) Write push() in MAL:

push:	lw   $8, 0($4)		# $8 is the actual stack pointer
	sw   $5, 0($8)		# *stack = x
	addi $8, $8, 4		# stack++
	sw   $8, 0($4)		# update stack pointer in memory
	jr   $31

Probably the subtlest part of this problem is the first instruction.
The variable stack is a *reference* parameter, which means that what
is actually passed in $4 is a *pointer to* stack (which is itself a
pointer to an integer).  That's why, in the equivalent C version, it's
shown as a pointer to a pointer.  So there are two levels of dereferencing:
First we dereference $4 to get the actual stack pointer (into $8), then we
dereference $8 to get to the desired stack element.

Scoring for typical errors:

4  OK

3  preincrement, i.e., *++stack instead of *stack++
   increment by 1 instead of by 4
   no jr instruction
   {**stack = x; stack++;} (increment $4 instead of $8)

2  (**stack)++;  (increment the stack element instead of the pointer)
   args not in $4 and $5

1  only one SW instruction

0  even worse


2.  LRU features

(a) What hardware supports LRU?

	The Reference bit (or Use bit) in the TLB or page table entry.

(b) How can an OS support LRU without it?

	Periodically turn off all the Valid bits, and use the resulting
	page faults to control a software-maintained reference bit.

Alternative answer to (b), less general but accepted:

	On the MIPS, where there is no page table hardware, the OS could
	assume that any entry that is about to be flushed from the TLB is
	fairly recently used, and turn on a software reference bit in the
	software-maintained page table.  (In other words, pretend that
	*every* valid TLB entry has its reference bit set.)

Scoring: 2 points each.  Part credit for answers that showed an understanding
of the key fact that we mustn't slow down every memory reference!  No part
credit for answers without this understanding, e.g., "have the OS record a
timestamp for every memory reference."  (The *hardware* could possibly
record a timestamp, but that doesn't answer the question about what the OS
can do without hardware help.)

No credit for an answer that just restates the question, e.g., "The OS must
find a way to tell which pages have been used recently."


3.  Cache performance.

Cache A:  Two blocks, one word each
Cache B:  One block of two words

(a) a[0]++; a[1]++; a[2]++; a[3]++;

	Cache A: a[0] misses, into first block;
		 a[1] misses, into second block;
		 a[2] misses, into first block;
		 a[3] misses, into second block.

	Cache B: a[0] misses, loads a[0] and a[1];
		 a[1] hits;
		 a[2] misses, loads a[2] and a[3];
		 a[3] hits.

	So cache B has a better hit rate (2 hits) than cache A (0 hits).

(b) a[1]++; a[3]++; a[1]++; a[3]++;

	Cache A: a[1] misses, into second block;
		 a[3] misses, into second block;
		 a[1] misses, into second block;
		 a[3] misses, into second block.

	Cache B: a[1] misses, loads a[0] and a[1];
		 a[3] misses, loads a[2] and a[3];
		 a[1] misses, loads a[0] and a[1];
		 a[3] misses, loads a[2] and a[3].

	So the two caches have the same hit rate (0 hits).

(c) a[1]++; a[2]++; a[1]++; a[2]++;

	Cache A: a[1] misses, into second block;
		 a[2] misses, into first block;
		 a[1] hits;
		 a[2] hits.

	Cache B: a[1] misses, loads a[0] and a[1];
		 a[2] misses, loads a[2] and a[3];
		 a[1] misses, loads a[0] and a[1];
		 a[2] misses, loads a[2] and a[3];

	So cache A has a better hit rate (2 hits) than cache B (0 hits).

So the two answers are C and A.

Scoring: 2 points each.


4.  Finding an address.

A page is 2^12 bytes, so the virtual address is divided into a 20-bit
virtual page number and a 12-bit offset.  So the address 0x40fa25d4 is

	VPN:	40fa2
	offset:	5d4

We look up the VPN in the TLB and find it in slot 6:

	VPN 40fa2, physical page 076

Therefore the desired physical address is 0x0765d4.

Now we are done with the TLB and need to find this address in the cache.
The cache width is 2^5 bytes, so there are five bits of cache offset.
The total cache size is 2^9 bytes, so there are 2^4 = 16 cache blocks.
The cache is 2-way set associative, so these 16 blocks are organized as
8 (= 2^3) sets of 2 blocks each.  So there are three bits of set number.

Physical address = 0x765d4 = 0111 0110 0101 1101 0100 binary
			   = 0111 0110 0101 110 10100
                                            --- -----
                                         set #    offset

so we are looking for an offset of 0x14 in a block of set # 6.

Set #6 has two blocks, whose keys are a1c5 and 0765.  The second of
these matches the leftmost bits of the physical address, so we use
that block (block # 0xd), and at offset 0x14 we find the data, 0x46.

Scoring:

4  found data 0x46
3  found cache block 0xd, wrong offset
2  computed physical address 0x765d4
1  found TLB slot 6
0  none of the above


5.  Translate C/C++ to MIPS:

int modemod4(int a[], int len) {
    int counts[4];

    zero(counts);
    while (--len >= 0) {
	counts[a[len] & 03]++;
    }
    return biggest(counts);
}


modemod4:
	addi  $sp, $sp, -28	# allocate frame, including counts array
	sw    $31, 16($sp)	# save stuff
	sw    $4, 20($sp)
	sw    $5, 24($sp)

	add   $4, $sp, $0	# $4 gets &counts[0]
	jal   zero

	lw    $4, 20($sp)	# restore args after procedure call
	lw    $5, 24($sp)

loop:	addi  $5, $5, -1	# --len, so decrement before test
	bltz  $5, fini

	sll   $8, $5, 2		# 4 * len
	add   $8, $8, $4	# &a[len]
	lw    $9, 0($8)		# **** a[len] ****
	andi  $9, $9, 3		# a[len] & 03
	sll   $8, $9, 2		# 4 * (a[len] & 03)
	add   $8, $8, $sp	# &counts[a[len] & 03]
	lw    $9, 0($8)		# **** counts[a[len] & 03] ****
	addi  $9, $9, 1
	sw    $9, 0($8)		# **** counts[a[len] & 03]++ ****

	b     loop

fini:	add   $4, $sp, $0	# $4 gets &counts[0]
	jal   biggest

	lw    $31, 16($sp)	# unwind everything
	addi  $sp, $sp, 28
	jr    $31


Scoring:  We considered that the core of this problem is the three
instructions marked with **** above: two loads and a store.  So...

6  correct
5  minor errors
4  load, load, store (to the correct memory addresses) with perhaps
	one address calculated incorrectly
3  load, load, store but two or more errors in that part
2  no load, load, store, but something looking like a while loop inside
	a procedure
1  something weirder than that
0  even worse


Many people were confused about the allocation of the arrays.
The array COUNTS is a local variable in this procedure, so it is
allocated on the stack, all 16 bytes of it -- not just a pointer.

The array A, however, is an argument to the procedure, and arrays
in C/C++ are passed by reference, so what is in $4 is a pointer to
the array A (which is allocated by some other part of the program).

Similarly, when we pass COUNTS as an argument to the procedures ZERO
and BIGGEST, we pass it by reference, i.e., we put its address in $4,
which is what the
	add $4, $sp, $0
does, because the array is allocated at the bottom of our stack frame,
i.e., at 0($sp).  So that add instruction is equivalent to
	la  $4, 0($sp)

Some people were confused by the foo&03 business.  It's equivalent to
foo%4, i.e., the remainder on division by 4.  (This works only because
4 is a power of two!  You couldn't get foo%5, for example, with foo&4.)
Anding is faster than remaindering, which is why we wrote it this way.

A common mistake was to translate len-- instead of --len, i.e., to
test len against zero before decrementing it.