Terminologies

Disk Page
The unit of transfer between the disk and memory. Typically set as a config parameter for the DBMS. Typical value are 4K, 8K, 16K and 32K.
Frame
A unit of memory in the buffer pool. Typically the same size as the disk page.
Buffer Pool
A collection of frames used by the DBMS to temporarily keep data for use by the query processor (CPU). This link has a great summary of what a buffer pool should do. Since I don't have the permission to copy it, please head to that link and read it yourselves.
Buffer Control Block
Metadata about buffers and their contents. Used to keep track of the mapping between pages in buffer and pages on disk, page request count, modified or not, and more. An example, frame#, pageid, pin_count, dirty
Pin Count
The number a page is being requested.
Dirty Page
A page that has been modified in the buffer, but not yet flushed to disk.
Prefetch
Transfer data from disk to buffer in readiness for later use.
Prefetcher
A DBMS component (a thread/agent) who does prefetch.
Cleaner
A DBMS component (a thread/agent) who cleans buffer pool and flushes dirty pages onto disk. Consider it as the dual of prefetcher.
Temporal Locality
If at one point in time a particular memory location is referenced, then it is likely that the same location will be referenced again in the near future. See [wiki].

Disks

From file system perspective, disk is addressed as a one dimension array of logical sectors. Disk controller maps logical sector to physical sector identified by surface #, track #, and sector #.

Default mapping is sequential. Logical sector 0 is the first sector of the first (outermost) track of the first surface. Logical sector address incremented within track, then tracks within cylinder, then across cylinders, from outermost to innermost.

  • Sequential access latency
    • Seek to the right track
    • Rotate to the right sector
    • Transfer
  • Random access latency
    • Seek to the right track
    • Rotate to the right sector
    • Transfer
    • Repeat

A very good chapter on disks, here at http://pages.cs.wisc.edu/~remzi/OSTEP/file-disks.pdf

Pages

Typically, the smallest unit of I/O a DBMS can handle is a page. Some DBMS allow rows to span pages, some don’t. Pages have internal structures to organise its content, like data pages, index pages, catalog pages, etc. Also, different DBMS have different implementations of page structures.

Replacement Policy

Optimal Replacement

The most efficient caching algorithm would be to always discard the information that will not be needed for the longest time in the future. This optimal result is referred to as Bélády’s optimal algorithm or the clairvoyant algorithm. Since it is generally impossible to predict how far in the future information will be needed, this is generally not implementable in practice.

LRU

The least recently used page (LRU) replacement algorithm works on the idea that pages that have been most heavily used in the past few instructions are most likely to be used heavily in the next few instructions too. While LRU can provide near-optimal performance in theory, it is rather expensive to implement in practice.

  1. For each page in buffer pool, keep track of time last unpinned.
  2. Replace the frame that has the oldest (earliest) time.

LRU works well because it tends to remove cold pages from the buffer to make space for the faulted page. If the faulting page is cold, however, then LRU may displace a warmer page to make space for the cold one. Furthermore, the cold page will reside in the buffer for a considerable amount of time. [ref]

FIFO

Fhe simplest page-replacement algorithm is a FIFO algorithm. The first-in, first-out (FIFO) page replacement algorithm is a low-overhead algorithm that requires little book-keeping on buffer pool. The idea is obvious from the name – the buffer keeps track of all the pages in memory in a queue, with the most recent arrival at the back, and the oldest arrival in front. When a page needs to be replaced, the page at the front of the queue (the oldest page) is selected (note that we actually mean the first unpinned page from the front). While FIFO is cheap and intuitive, it performs poorly in practical application.

Clock

Arrange frames into a circular list, with each page in the frame having an extra reference bit. The reference bit will be set to 1 when the pin count decreased to zero. When looking for replacement, it starts from the current clock hand position, looking for pages with zero pin count.

  • If it is set, it means someone just unpinned it to zero, so it is recently used. The algorithm will clear reference bit, and go to the next unpinned page in the queue. This gives the page a “second chance” to survive.
  • If it is not set, then it means we previously cleared it, so it has not been visited for a whole round. Select this page as a victim, replace it with newly loaded page.

MRU

Discards, in contrast to LRU, the Most Recently Used (MRU) items first. When a file is being repeatedly scanned in a Looping Sequential reference pattern, MRU is the best replacement algorithm.

Simplified 2Q

On the first reference to a page, 2Q place it in a special buffer, the A1 queue, which is managed as a FIFO queue. If the page is referenced during its A1 residency, then it is probably a hot page. So, if a page in A1 is referenced, the page is moved to the Am queue, which is managed as an LRU queue. If a page is not referenced while on A1, it is probably a cold page, 2Q removes it from the buffer. Full 2Q algorithm can be found in its original paper here.

Exercise

Replacement

Suppose we have a buffer pool of 4 frames, initially empty. We have the following access pattern, and assume every access takes almost no time (or, unpin immediatly after pin). Please list buffer status for every moment using these six policy mentioned above. The optimal replacement scenario is given below.

Time T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
Block Accessed A B C D E F B D C A

Example (Optimal Replacement)

Time F1 F2 F3 F4 Comments
T1 A        
T2 A B      
T3 A B C    
T4 A B C D  
T5 E B C D A will not be used until T10
T6 F B C D E will never be used again
T7 F B C D Hit
T8 F B C D Hit
T9 F B C D Hit
T10 A B C D Neither will be ever used again. Just remove the first.

Hit ratio = 3/10 = 0.3

Solution

LRU

Time F1 F2 F3 F4 Comments
T1 A        
T2 A B      
T3 A B C    
T4 A B C D  
T5 E B C D A comes at T1, is the current oldest.
T6 E F C D B comes at T2, is the current oldest.
T7 E F B D C comes at T3, is the current oldest.
T8 E F B D Hit
T9 C F B D E comes at T5, is the current oldest. Note that D is just refreshed at T8.
T10 C A B D F comes at T6, is the current oldest. Note that B is also refreshed at T7.

Hit ratio = 1/10 = 0.1

FIFO

Time F1 F2 F3 F4 Comments
T1 A        
T2 A B      
T3 A B C    
T4 A B C D  
T5 E B C D  
T6 E F C D  
T7 E F B D  
T8 E F B D Hit
T9 E F B C D comes the earliest.
T10 A F B C  

Hit ratio = 1/10 = 0.1

Clock

Time F1 F2 F3 F4 Comments
T1 A,1 hand     Clock hand moves to F2, A.ref = 1
T2 A,1 B,1 hand    
T3 A,1 B,1 C,1 hand  
T4 A,1 B,1 C,1 D,1  
T5 E,1 B,0 C,0 D,0 Set A.ref = 0 then B.ref = 0, C.ref = 0, D.ref = 0, then goes back to F1 and replace A. Move to F2.
T6 E,1 F,1 C,0 D,0 Since B.ref = 0, replace B, move to F3.
T7 E,1 F,1 B,1 D,0 Since C.ref = 0, replace C, move to F4.
T8 E,1 F,1 B,1 D,1 Hit. Algorithm not invoked at all. But generates a page fault. The handler does D.ref = 1. [ref]
T9 E,0 F,0 B,0 C,1  
T10 A,1 F,0 B,0 C,1  

Hit ratio = 1/10 = 0.1

MRU

Time F1 F2 F3 F4 Comments
T1 A        
T2 A B      
T3 A B C    
T4 A B C D  
T5 A B C E D is youngest.
T6 A B C F E is youngest.
T7 A B C F Hit
T8 A D C F B is youngest.
T9 A D C F Hit
T10 A D C F Hit

Hit ratio = 3/10 = 0.3

S2Q

Time F1 F2 F3 F4 Comments
T1 A, A1        
T2 A, A1 B, A1      
T3 A, A1 B, A1 C, A1    
T4 A, A1 B, A1 C, A1 D, A1  
T5 E, A1 B, A1 C, A1 D, A1 A1 is FIFO, A comes in first.
T6 E, A1 F, A1 C, A1 D, A1 B comes first.
T7 E, A1 F, A1 B, A1 D, A1 C comes first.
T8 E, A1 F, A1 B, A1 D, Am Hit
T9 C, A1 F, A1 B, A1 D, Am E comes first.
T10 C, A1 A, A1 B, A1 D, Am F comes first.

Hit ratio = 1/10 = 0.1