Midterm 2, November 2, 1998  - solutions


1.  Explain thrashing.  Define the Working Set Paging algorithm and
explain how one of the features of the Working Set
paging algorithm is to avoid thrashing. (11)

Thrashing is a situation in which the processor spends most of its
time either processing page faults or in page fault idle.  This occurs
either because the degree of multiprogramming is too high, or one or
more processes are badly behaved and use too much memory.  In either case,
it is because demand for memory exceeds supply.

The working set paging algorithm keeps in memory exactly those pages
for each process that have been referenced in the preceding T virtual
time units.  Or - the working set paging algorithm keeps in memory those
pages that a process needs to run efficiently.

The working set paging algorithm will not allow a process to run
(i.e. put it in the "in memory queue") unless there is enough memory
available for its working set.


2. For the following page reference string, give the number of page
faults for memories of sizes 3 and 4, and for working set, for a 
working set parameter value of 4.5.   No partial credit, but show your
work. (21)
.ft CW

   2 3 4 5 2 3 1 2 3 4 5 1 2 3

             \    3      4
              _____________
         LRU  |  12      10
              |
         FIFO |  11      12
              |
         OPT  |   8       6
              |

         WS           10
 ==============================
  work:


3. What is the function of a page table? What is the function of the
memory map( also called the "core map")? What is the relation of one
to the other. Why do you need both?


A page table contains entries that map virtual page numbers to
physical page numbers. 

A memory maps maps physical page frames to virtual page numbers. The
information that we really care about in this mapping is not to be
able to associate page numbers, but to be able to locate Page Table
Entries that point to a particular physical page frame.

We need to locate these PTE's when a page is to be replaced. The PTE's
can contain referenced and modified bits that helps the OS make its
page replacement decision. Most importantly, the PTE for the page that
is to be swapped out must be modified to reflect the fact that it's no
longer in memory. 

You received 4 points for knowing what a page table is. The remaining
8 points are given for how well you knew what a memory map is and WHY
we need it to do paging.

4. Suppose that you have a region of memory shared between two
processes. Suppose that within the shared region are memory addresses
(e.g. there is a linked list). Why is that a problem? How would or
could you solve the problem?

A user process can only reference memory using a virtual
address. Thus, any address stored in the shared region is a virtual
address. The translation of this virtual address depends on the
context, i.e. what process is doing the reference. Thus, a virtual
address stored in the shared region can map to different physical
locations when referenced by different processes.  Also, the address
will not always be valid for all processes.

One solution is to make sure that the shared region resides at the same
virtual memory location for all processes. For example, we can reserve the top
portion of the virtual address space for a shared segment.

Another solution is to interpret all address values as relative
locations (for example, relative to the location of that address
value). The C code below illustrates this concept.

Struct LinkedList {
  Data d;
  LinkedList *next;
}

LinkedList * getNextElement(LinkedList *element) {
  return &(element->next) + element->next;
}

5-6 each for identifying the problem and proposing ONE valid
solution. Prohibiting sharing is not a valid solution!

You received 2 points if you mentioned the need for mutual
exclusion. You got these points because you said something that is not
wrong, but does not address the more fundamental problem.


5. What are the tradeoff between contiguous file allocation and linked
file allocation? Explain the use of extents for contiguous allocation
and explain how they improve things.

Contiguous file allocation means allocating the files in consecutive
blocks on the disk.

Contiguous File Allocation
+ fast sequential access
+ fast random access 
+ low disk space overhead
- external fragmentation
- hard to grow file


Linked File Allocation
+ no external fragmentation
+ easy to grow file 
- very slow random access (must search through the whole list)
- potentially slow sequential access (if the list blocks are randomly
      allocated over the disk surface)
- some disk space overhead to store pointers to the next block.

The use of extents is a hybrid of the above two layout methods. The
idea is that the entire file does not have to be contiguous, but we
try to allocate a file in as few contiguous, variable sized blocks as
possible.

There are two ways to keep track of the contiguous blocks allocated to
a file. We can have a linked list of extents. It would have all the
advantages of linked list allocation, plus faster random access since
fewer reads would be required to locate a block.

If we place a small upper limit on the number of contiguous
blocks, say 16, we can keep an index of pointers to these blocks. In
this case, random access will be just as fast as contiguous
allocation, since that index can be stored in memory.  Sequential
access will not suffer too much if the number of contiguous blocks is
small (hence each block is large).  But there is a potential for
external fragmentation, and it may be difficult to grow files.

You received 8 points for listing trade offs between contiguous and
linked list allocation, and 4 points for being able to describe and
characterize the use of extents. You did not need to comprehensively
analyze the trade offs to receive credit.


6. I've said that CPU run times for processes are highly skewed.  What
does that mean?  What does that imply for scheduling algorithms? 
Suppose that the distribution of CPU run times had a low variance;
i.e. was centralized.  In that case, what would be the characteristics
of good (realizable) scheduling algorithms?  In the latter case, please
rank the performance (in terms of lowest mean flow time being best) of 
FIFO, RR and SET.  (14)

(2 points)  CPU run times being highly skewed means that there is a large
variation in run times over all jobs.  In practice, there are many small
jobs and significantly fewer very long jobs.

(3 points)  We don't want small jobs to have to wait for long jobs 
to complete.  The less time a job has run, the less time it is 
likely to take before it completes.  Thus, we want to give priority 
to jobs that have not run for very long,
since those jobs are most likely to finish soon.

(3 points) Good scheduling algorithms for a centralized distribution
of CPU run times would run jobs to completion once they were started.
The longer a job has run, the more likely it is to finish soon, so
replacing a long-running job with a short-running job on the CPU will
increase flow times.

(6 points)  FIFO, RR, SET.
FIFO is best because it runs jobs to completion without taking them off
the CPU.  RR is worse because it time-slices equally between all jobs,
so earlier jobs will have to give up the CPU to later jobs and that will
cause them to finish later.  SET is worst because it gives preference to
the newest jobs, preventing the long-running jobs from completing.  With
SET, all jobs will complete around the same time.  For this question, the
order FIFO, SET, RR got 3 points of partial credit.  Any other order got
0 points.

Note: Many people used the amount of overhead as their sole criterion
for evaluation.  While this may be valid (depending on the amount of
overhead), it misses the point that it's better to run the longest-running
jobs when you have a centralized distribution of CPU times.  I gave partial
credit for overhead-only answers.  Overhead is a major consideration
only if it is quite large.


7. What is a stack algorithm?  (Your explanation should mention and
define the inclusion property).  For each of the following paging algorithms, 
(LRU, MIN, OPT, FIFO, Clock, Working Set, Page Fault Frequency)
say whether it is a stack algorithm or not.  (HINT: in my opinion, the
ones that are stack algorithms are obviously so; the rest are not.)
For those that are stack algorithms, explain why (i.e. give an
informal proof or a convincing argument).  (For those that are not stack
algorithms, you do not have to prove it or explain why not.)  

(3 points) A stack algorithm is one that satisfies the inclusion property.
The inclusion property states that, at time t, the contents of memory
of size k pages is a subset of the contents of memory of size k+1 pages.
In other words, running the same algorithm with more pages will never
increase the number of page faults.

(2 points for each correct identification)
LRU			stack
MIN			stack
OPT			stack
FIFO			not stack
Clock			not stack
Working Set		stack
Page Fault Frequency	not stack

(.5 points for each justification)
In LRU, the k most recently accessed (unique) pages is always a subset
of the k+1 most recently accessed (unique) pages.

MIN (= OPT), works like LRU, only looking forward in time instead of
backwards.  In particular, we can prove this by induction.  Assume that
the inclusion property holds at time t.  Then any replacement from a
memory of size k+1 will also be evicted from a memory of size k (if it
is there), thus maintaining the inclusion property at time t+1.

Working set keeps the set of pages which were accessed in the last tau time
intervals.  This is obviously a subset of the set of pages accessed in the
last tau+1 time intervals.

Note: This problem asked you to say for each algorithm whether is was
a stack algorithm or not.  A good number of people lost points because
they did not explicitly state this for some algorithms.