UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Computer Science Division CS 162 Alan Jay Smith Fall, 1998 Midterm 1, October 5, 1998 Part I You have until the time announced for this exam. The exam is closed book. All answers should be written on the exam paper. Anything that we can't read or understand won't get credit. Any question for which you give no answer at all will receive 25% par- tial credit. Please answer in standard English; illiterate or il- legible answers to essay questions will lose credit. Please watch the front board for corrections and other information. This exam has 7 questions on 6 pages and is in two parts. Name (last, first, middle):______________________________________ Student ID #_________________________ Class Account:____________________________________ 1. What is the difference between an open and a closed (queueing) system? In studying scheduling algorithms, why does it matter which one we use? Does the scheduling algorithm affect the throughput in an open system? Explain. (12) ---------------------------------- An open system is one in which the arrival rate is not related to the number of customers in the system. world -----> system -----> done A closed system is one in which the total number of customers in the "system"+"the world" is constant. Thus, typically, the arrival rate drops as the number of customers queued or in service increas- es. <-----world--------< | | | | >-----system------>^ The reason to use the open system is that it is easier to analyze and to say definite things about. The reason to use a closed sys- tem is that it is more realistic - almost any real system is closed. The behavior of the system under different queueing disci- plines may not be the same (at least in certain respects), so in some cases it is important to use a realistic model. In an open system, the throughput is invariant with the scheduling algorithm (as long as rho (=arrival rate/service rate) is <1. In such a case, all arriving jobs get processed and leave, so the throughput is exactly equal to the arrival rate. Since in (almost) any real system the throughput does vary with the scheduling algo- rithm, we can see that it does matter whether we use an open or closed system. ===================================================================== 2. Assume that a cafeteria has a person making sandwiches. In most cafeterias, FIFO scheduling is used. I.e. the person at the head of the line gives his/her order, and the sandwich maker makes the sandwich, gives it to that person, and then takes the next order. Please compare and explain the relative desirability of using SET, FIFO and RR scheduling for the sandwich making process. (12) --------------------------------- The point of this question was to realize that sandwich scheduling is NOT the same as CPU scheduling. A lot of people just recited what they learned about CPU scheduling (in some cases by making the unreasonable assumption that sandwich making times are highly skewed). The relative desirability of scheduling algorithms de- pends on the job processing time distribution (highly skewed for the CPU, very small skew for sandwich maker in a cafeteria) and the overhead of task switching (low for CPU, high for sandwiches). In a cafeteria (not a take-out sandwich shop in the business dis- trict), almost all customers order one sandwich, and most sandwich- es take a similar amount of time to make. I.e. the sandwhich- making-time distribution is centralized, not skewed. Thus FIFO should give the lowest flow time. RR will have a much higher flow time (remember the example I gave in class?), and also will incur high overhead, as the sandwich maker switches between partially finished sanwiches. SET involves always working on the least fin- ished sanwich. For a centralized distribution (i.e. the expected time to completion is decreasing), this is the worst algorithm - i.e. gives the highest flow time, and it also has high task switch overhead. Name (last, first, middle):______________________________________ 3. In class, four methods were given for minimizing the amount of space allocated to the page table. Please list and explain each. Indicate any advantages and disadvantages. (20) Method 1 - reduce VA space We can reduce the size of the virtual address space available to a process. For example, in a 32-bit system, the OS might choose to allocate page tables only for the first 128MB of the VA. If the page size is 4KB, this means we will only have 128MB/4KB = 32K en- tries instead of 4GB/4KB = 1M entries in the page table. Under this scheme, it would be illegal for a process to reference a virtual address above 128MB. Advantages: No hardware support is needed and easy to implement. If the page fault handler sees a reference above 128MB, just say "il- legal reference" and kill the process. Disadvantages: Some processes may need more than 128MB of VA space. A likely problem is that the process doesn't need more than 128MB of real memory, but it uses non-contiguous blocks in the 32-bit ad- dress space. For example, it might want to allocate its stack starting from the top of the VA space. Method 2 - multilevel page tables. Let's take a step back and think about why a single level page table can get so large. Say we have a 32-bit address space, 4KB page size. page number page offset + --- 20 bits --- + --- 12 bits --- + Suppose a process needs to use only 16MB of memory, why do we say 4GB/4KB = 1M page table entries are required? In this single level scheme, we are using the first 20 bits of VA as an offset in REAL memory to locate a PTE. Even though most of the 1M pages will not be used by the process, we need to allocate space to hold 1M PTE's just so that those PTE's the process does use can be found. In contrast, a two-level paging scheme might look like this. page directory page number page offset + --- 10 bits --- + --- 10 bits --- + --- 12 bits --- + The idea here is to add a level of indirection in our search for PTE's. We recognize that many blocks of PTE's will never be used, so let's group these PTE's into 1024 groups with 1024 PTE's each. We use the first 10 bits of VA to index into a directory that points to information about each of the 1024 PTE groups. If a pro- cess never requested portions of its VA corresponding to a PTE group, the OS doesn't need to allocate memory for that group. Thus, we use the first 10 bits to lookup a page directory, which returns a pointer to a 1024 entry page table (PTE group), and we use the next 10 bits to lookup that page table. Assuming no inter- nal fragmentation (both within pages and PTE groups), only 1024 + 16MB / 4KB = 5K entries are required for this process. Advantages: Full address space available to processes. Can extend to arbitrary levels of indirection to handle even larger address spaces. Disadvantages: We can suffer from fragmentation in PTE groups. What happens if a process needs exactly one page in every PTE group? Even though only 1024 pages are used, 1024 + 1M PTE's would be al- located. Also, this method requires specific hardware support (to avoid trapping to OS at EVERY memory reference) and may incur a performance penalty due to indirection. Method 3 - Put User Page Tables in OS Virtual Memory We can use virtual memory for the operating system as well as for the user. We can allocate single level user page tables in the op- erating system's virtual memory. Of course, those page tables are huge, but if a given user process is only using a small number of pages, only those pages of the user's page table which contain ac- tive PTEs actually have to be in memory. The remainder are allo- cated in the OS virtual memory, but don't have to be physically resident in main memory. Of course, the problem now is that the operating system uses virtu- al memory, and we can't put the OS page tables in the OS's own vir- tual memory and page them - we get into a loop. So we put the OS page tables in real memory. Note that we thus have 2 level ad- dressing again - the OS page table and the user page table. Advantages: This is pretty much the same as scheme 2. Disadvantages: Pretty much the same as scheme 2, plus the problem of finding enough real memory to contiguously allocate the OS page table. Method 4 - Inverted Page Tables Page tables map every virtual address to a physical address. Page tables can become wastefully large because some virtual addresses never get used. So why don't we use a hash table? The Hash table only needs O(# of page frames in real memory) size - e.g. perhaps twice as many entries as there are page frames in real memory. The obvious problem here is that lookups now involve doing a hash table lookup in hardware. (It's easy in software; it isn't so easy in hardware.) We can share one table among all processes (and index it by VA and PID). Advantages: Page table size is proportional to the amount of real memory. Only one table is needed for all processes. Disadvantages: Slow lookup time. Sharing pages among processes is difficult, since we need to figure out how to map different VA+PID pairs to the same physical address. Complex implementation. Please refer to Chapter 8 of Silberschatz and Galvin for detailed explanations of methods 2 and 4. I accepted segmentation + paging in addition to multilevel paging. If you said increase page size, you received some credit as well. 4. The set in the TLB is usually selected using the low order vir- tual address bits. Why? (10) The question should have been phrased, "the set in the TLB is usu- ally selected using the low order PAGE NUMBER bits. Why?" Everyone made this interpretation anyway. The short answer is: locality of reference, in both space and time. If page x was referenced, we expect nearby pages to be referenced, and also expect page x to be referenced again in the near future. To maximize hit rate, we would like to keep the most recently used entries in the TLB. How can we do that without full associativity? Using low order bits to index the set means during a cache miss, we evict an entry with the same low order bits but different high or- der bits, in other words, an entry that is spatially distant. Spa- tial distance implies temporally distance. Thus, the best way to mimick LRU eviction is to index sets with low order bits. Another way to say the same thing is as follows. Locality of refer- ence means the low order bits in a page number vary much more fre- quently than the high order bits. If high order bits are used for set selection, more temporally adjacent references would map into the same set, resulting in more cache misses. Another view: most processes allocate a few contiguous areas of memory. If we used high order bits, those would all map into a small number of TLB entries. 5. Why is the text_and_set instruction preferred for synchroniza- tion to compare_and_swap? (10) Compare_and_swap involves reading two arbitrary values and writing two arbitrary values. Test_and_set reads one arbitrary value, writes one arbitrary value (returning a value involves writing to a register!), and writes a fixed value of 1. This is simpler to im- plement efficiently in hardware because it involves one less read, and we do not need additional datapath to carry data to the fixed value write. For a 32-bit machine, we need to use a 64-bit datapath to perform a swap in one cycle, but only 32-bit datapath to do test_and_set. I accepted most answers that addressed the additional complexity of compare_and_swap. Note however, that these instructions are imple- mented in hardware, and reasoning about how it would be implemented in software (how many instructions it takes, or the use of tempo- rary registers) is not exactly correct. 6. For each of FIFO, SRPT, and RR (Q=.25), and for the following set of arrival and service times, please show a time line for which process is executing, and compute the mean flow time. Show your computations. (We might given partial credit, if you made a simple and obvious error; we're not going to try to decode your calcula- tions if they aren't obvious.) (20) arrival service A 0 1.75 B .4 .9 C 1.4 1.1 --------------------------------- FIFO: (6 points) ------------------------------------------------------------ | A | B | C | ------------------------------------------------------------ 0 1.75 2.65 3.75 (1.75 - 0) + (2.65 - .4) + (3.75 - 1.4) mean flow = --------------------------------------- = 2.12 3 SRPT: (7 points) ------------------------------------------------------------ | A | B | A | C | A | ------------------------------------------------------------ 0 0.4 1.3 1.4 2.5 3.75 (3.75 - 0) + (1.3 - .4) + (2.5 - 1.4) mean flow = ------------------------------------- = 1.92 3 RR: (7 points) ------------------------------------------------------------- | A | A | B | A | B | A | B | C | A | B | ------------------------------------------------------------- 0 .25 .5 .75 1.0 1.25 1.5 1.75 2.0 2.25 2.4 ------------------------------------- | C | A | C | A | C | C | ------------------------------------- 2.4 2.65 2.90 3.15 3.4 3.65 3.75 (3.4 - 0) + (2.4 - .4) + (3.75 - 1.4) mean flow = ------------------------------------- = 2.58 3 Also accepted for RR was a variation where the new process goes on the front of the queue, not the back. The mean flow time in that case is 2.67. 7. For the following two cases, please either show a complete safe sequence or show that there isn't one. (16) Process has-X has-Y max needs-X max needs-Y A 30 40 45 330 B 20 90 80 120 C 50 30 90 70 D 70 100 130 250 a. available: X: 70 Y: 70 --------------------------------- (8 points) There are three possible solutions: B, C, D, A; B, D, C, A; and C, B, D, A. The available resources after each process is run is as follows. B, C, D, A: X | Y -----+----- 90 | 160 140 | 190 210 | 290 240 | 330 B, D, C, A: X | Y -----+----- 90 | 160 160 | 260 210 | 290 240 | 330 C, B, D, A: X | Y -----+----- 120 | 100 140 | 190 210 | 290 240 | 330 b. available: X: 70 Y: 65 --------------------------------- (8 points) There is no safe sequence because there is only 325 Y in the system and process A needs 330 Y to run.