Overview

Instead of SimpleDB, we are actually using PostgreSQL. This is a real world database system that is super popular in the industry. Having experience hacking a real thing should be much more fun. However, we require you use Linux based OS and C/C++ programming language. We have tested it on Cloud9 IDE. The makefile presented should also work on Ubuntu. You are going to modify two files to let your PostgreSQL use LRU cache replacement algorithm instead of its pre-shipped Clock.

Assignment 1: Buffer Manager

Due Date after spring break. Mar 20th, 11:59pm

In this project you will add functionality to the backend server of PostgreSQL. You need to dive into the actual server code. The actual coding for this project is minimal, but you will need to figure out what code to add and where, which is non-trivial!

The major parts of this project are:

  • Part 1: Understanding different memory management strategies
  • Part 2: Examining, understanding and modifying existing code
  • Part 3: Measuring and analysing the effects of your code in a real system

Deliverables and Grading

  • You will be writing your code in policy folder, as described in the Coding section below. Templates are given. I may update the templates later, but it should be very easy for you to update your code.
  • I will provide testing scripts to generate buffer diagram, statstics and so on.
  • I will later provide scripts for you to package everything you need to submit. You then just submit that package using your login name to gsubmit.
  • I’m considering providing you with a correct buffer diagram output for LRU, so that you can grade yourself before submission.
  • For Part 3, you will be filling some forms. See next section. Scripts for getting these information is comming.

Comparision

Experiment 1: Buffer replacement policy without indexing

We are only interested in the buffer hit rate for shared blocks. Run for different number of buffers and fill in the entries of the following table. Details about how to run them is in following sections. (Comming soon).

Num Buffers 16 32 64 80 90
LRU          
MRU          
CLOCK          

Experiment 2: Buffer replacement policy with indexing:

Repeat the same steps as above but with indexing enabled as described here.

Num Buffers 16 32 64 80 90
LRU          
MRU          
CLOCK          

Setup

  1. Get an account to Cloud9 IDE, create a new workspace with C/C++ option, and click on Start Editing. You may need to use Chrome, and refresh after you created the workspace.

  2. Run the following command in the Cloud9 IDE terminal.

    bash git clone https://github.com/steinwaywhw/cs460-spring2015-pa1.git cd cs460-spring2015-pa1 make setup make testclock

  3. Then refresh the TreeView (or refresh the whole page), you will see the output folder.

Now you should have a compiled version of PostgreSQL, and you can actually see its Clock stats upon a real dataset in folder output.

Keep Up-to-Date with Latest Scripts

Since I may update the code (as I’ve already done), you may need to issue the following command for a complete re-setup

cd cs460-spring2015-pa1/postgresql
sudo make uninstall
cd ..
rm -rf postgresql
git pull
make setup
make testclock

Or you may need just a quick re-setup

cd cs460-spring2015-pa1
git pull

If there’s still something that doesn’t work, try delete everything and start over. Make sure you backup your code before doing the followings.

rm -rf cs460-spring2015-pa1

and then go back to the first step to redo the setup.

Coding

You should write your own code in policy folder, and you should use *.clock.* files as templates to start. Also, follow that filename naming conventions to name your file as *.lru.*. Later, I will update testing scripts to copy your implementation into the postgresql source tree, and run the test.

You need to read these files

  • postgresql/src/backend/storage/buffer/freelist.c
  • postgresql/src/backend/storage/buffer/buf_init.c
  • postgresql/src/backend/storage/buffer/bufmgr.c
  • postgresql/src/include/storage/bufmgr.h
  • postgresql/src/include/storage/buf_internals.h
  • postgresql/src/include/storage/buf.h

and this may be very helpful: http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf, and this https://docs.google.com/presentation/d/1f3M3iQ9XH2ouqU5o3jIZ4KBe0x4QxNozGMMyg5hyNXM/edit?usp=sharing

Some Notes on PostgreSQL Source Code

In buf_internals.h

  • The most important data structure is BufferDesc. Please read its definition, and comments.
  • Also, they have flags like BM_DIRTY which indicates it is a dirty buffer page. Please find how they use these predefined macros to test if a buffer is dirty.
  • Note that they define BM_MAX_USAGE_COUNT to be 5. Think about this. In traditional clock algorithm, we are giving each buffer a second chance. So how many chances are we giving out if the usage count can be as much as 5?
  • Please find out what is usage_count and what is refcount. Also, take a look at buf_id and freeNext.
  • Note their definition for two special value FREENEXT_END_OF_LIST and FREENEXT_NOT_INT_LIST.

In buf.h

  • Please read the description about buffer identifiers.
  • Read the comments about InvalidBuffer and find out the correct index range of shared buffer.

In bufmgr.h

  • They actually use different buffer replacement policy for different access mode. We only care about the BAS_NORMAL case. (Other cases may be something like issuing LOAD and INGEST in SQL.) Please take a look at BufferAccessStrategyType.
  • Note that they declared NBuffers in this file.
  • Figure out how they use BufferIsPinned() macro to test if a buffer page is pinned. Again, we only care about shared buffer. They have something else called local buffer, but we don’t care.
  • BufferGetBlock is to go from buffer descriptor to the real buffer data in memory. Please figure out the relationship here. Also, combine this with BufferGetPage.

In buf_init.c

  • Please read the comments in this file, especially the sentence under “Data Structures”.
  • Please note the definition of BufferDescriptors and BufferBlocks. Please figure out what are these variables. They are global variables mostly maintained in this file, but is accessable from some other files.
  • InitBufferPool is to inintialize the buffer pool at the very beginning. It shows how the engine allocate NBuffers buffer pages (each of size BLCKSZ, 1KB in our case) and corresponding buffer descriptors BufferDesc. These pages are linked initially. Please thoroughly understant this function. Very important. However, you don’t need to understand anything related to locking.

In bufmgr.c

  • ReadBuffer_common accepts requests from other part of database engine, looks up the requested page using BufferAlloc. (line 308-343)
    • If it is already in the buffer, it is done. Just return the data and maintain some status.
    • If it is not in the buffer, then load the page from disk into buffer.
      • If there is free buffer, put the data into free buffer.
      • Otherwise, run the replacement algorithm to choose a victim, and put data into this victim.
  • BufferAlloc looks up buffer and try to allocate new one if not found. Read its comments carefully. Some important comments at line 518, 524, 560, 571, 591, 703, 784. Note that they use BufTableLookup to speed up buffer look up process. You need to know this but you don’t have to modify it.
  • BufferAlloc calls another very important function StrategyGetBuffer, figure out what it is.
  • PinBuffer pins the buffer. Read its comments and see how it updates usage_count and refcount. Also note that we are working on the normal strategy case, where strategy == NULL.
  • UnpinBuffer unpins the buffer. Read its comments.
  • ShowBufferUsage is the function who prints out the hit rate thing in your stats.log.
  • ShowBufferDesc and ShowAllBufferDesc are two debugging functions I wrote. This prints out information you will see in your output folders. Read its code and understand what buffer.log file means. You will be using these functions in your implementation of LRU so that we can automatically test your implementation.

In freelist.c

  • BufferStrategyControl is the data structure where they maintain metadata for clock algorithm. Read its comments.
  • BufferAccessStrategyData is for something else. We don’t care about it. Same for anything related with buffer ring.
  • StrategyGetBuffer is very important. It tries to get a free buffer, or run the replacement algorithm if there isn’t any free ones. Again, we consider the normal case, where strategy == NULL. Read this function thoroughly.
  • StrategyFreeBuffer is also important. It puts the buffer into free list. Consider how to do that in LRU.
  • StrategyInitialize is to initialize their clock algorithm metadata. Read its comments. You may need to modify this to initialize your LRU metadata.