Instead of SimpleDB, we are actually using PostgreSQL. This is a real world database system that is super popular in the industry. Having experience hacking a real thing should be much more fun. However, we require you use Linux based OS and C/C++ programming language. We have tested it on Cloud9 IDE. The makefile presented should also work on Ubuntu. You are going to modify two files to let your PostgreSQL use LRU cache replacement algorithm instead of its pre-shipped Clock.
Assignment 1: Buffer Manager
In this project you will add functionality to the backend server of PostgreSQL. You need to dive into the actual server code. The actual coding for this project is minimal, but you will need to figure out what code to add and where, which is non-trivial!
The major parts of this project are:
- Part 1: Understanding different memory management strategies
- Part 2: Examining, understanding and modifying existing code
- Part 3: Measuring and analysing the effects of your code in a real system
Deliverables and Grading
- You will be writing your code in
policyfolder, as described in the Coding section below. Templates are given. I may update the templates later, but it should be very easy for you to update your code.
- I will provide testing scripts to generate buffer diagram, statstics and so on.
- I will later provide scripts for you to package everything you need to submit. You then just submit that package using your login name to
- I’m considering providing you with a correct buffer diagram output for LRU, so that you can grade yourself before submission.
- For Part 3, you will be filling some forms. See next section. Scripts for getting these information is comming.
Experiment 1: Buffer replacement policy without indexing
We are only interested in the buffer hit rate for shared blocks. Run for different number of buffers and fill in the entries of the following table. Details about how to run them is in following sections. (Comming soon).
Experiment 2: Buffer replacement policy with indexing:
Repeat the same steps as above but with indexing enabled as described here.
Get an account to Cloud9 IDE, create a new workspace with C/C++ option, and click on
Start Editing. You may need to use Chrome, and refresh after you created the workspace.
Run the following command in the Cloud9 IDE terminal.
bash git clone https://github.com/steinwaywhw/cs460-spring2015-pa1.git cd cs460-spring2015-pa1 make setup make testclock
Then refresh the TreeView (or refresh the whole page), you will see the
Now you should have a compiled version of PostgreSQL, and you can actually see its Clock stats upon a real dataset in folder
Keep Up-to-Date with Latest Scripts
Since I may update the code (as I’ve already done), you may need to issue the following command for a complete re-setup
cd cs460-spring2015-pa1/postgresql sudo make uninstall cd .. rm -rf postgresql git pull make setup make testclock
Or you may need just a quick re-setup
cd cs460-spring2015-pa1 git pull
If there’s still something that doesn’t work, try delete everything and start over. Make sure you backup your code before doing the followings.
rm -rf cs460-spring2015-pa1
and then go back to the first step to redo the setup.
You should write your own code in
policy folder, and you should use
*.clock.* files as templates to start. Also, follow that filename naming conventions to name your file as
*.lru.*. Later, I will update testing scripts to copy your implementation into the
postgresql source tree, and run the test.
You need to read these files
and this may be very helpful: http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf, and this https://docs.google.com/presentation/d/1f3M3iQ9XH2ouqU5o3jIZ4KBe0x4QxNozGMMyg5hyNXM/edit?usp=sharing
Some Notes on PostgreSQL Source Code
- The most important data structure is
BufferDesc. Please read its definition, and comments.
- Also, they have flags like
BM_DIRTYwhich indicates it is a dirty buffer page. Please find how they use these predefined macros to test if a buffer is dirty.
- Note that they define
BM_MAX_USAGE_COUNTto be 5. Think about this. In traditional clock algorithm, we are giving each buffer a second chance. So how many chances are we giving out if the usage count can be as much as 5?
- Please find out what is
usage_countand what is
refcount. Also, take a look at
- Note their definition for two special value
- Please read the description about buffer identifiers.
- Read the comments about
InvalidBufferand find out the correct index range of shared buffer.
- They actually use different buffer replacement policy for different access mode. We only care about the
BAS_NORMALcase. (Other cases may be something like issuing
INGESTin SQL.) Please take a look at
- Note that they declared
NBuffersin this file.
- Figure out how they use
BufferIsPinned()macro to test if a buffer page is pinned. Again, we only care about shared buffer. They have something else called local buffer, but we don’t care.
BufferGetBlockis to go from buffer descriptor to the real buffer data in memory. Please figure out the relationship here. Also, combine this with
- Please read the comments in this file, especially the sentence under “Data Structures”.
- Please note the definition of
BufferBlocks. Please figure out what are these variables. They are global variables mostly maintained in this file, but is accessable from some other files.
InitBufferPoolis to inintialize the buffer pool at the very beginning. It shows how the engine allocate
NBuffersbuffer pages (each of size
BLCKSZ, 1KB in our case) and corresponding buffer descriptors
BufferDesc. These pages are linked initially. Please thoroughly understant this function. Very important. However, you don’t need to understand anything related to locking.
ReadBuffer_commonaccepts requests from other part of database engine, looks up the requested page using
BufferAlloc. (line 308-343)
- If it is already in the buffer, it is done. Just return the data and maintain some status.
- If it is not in the buffer, then load the page from disk into buffer.
- If there is free buffer, put the data into free buffer.
- Otherwise, run the replacement algorithm to choose a victim, and put data into this victim.
BufferAlloclooks up buffer and try to allocate new one if not found. Read its comments carefully. Some important comments at line 518, 524, 560, 571, 591, 703, 784. Note that they use
BufTableLookupto speed up buffer look up process. You need to know this but you don’t have to modify it.
BufferAlloccalls another very important function
StrategyGetBuffer, figure out what it is.
PinBufferpins the buffer. Read its comments and see how it updates
refcount. Also note that we are working on the normal strategy case, where
strategy == NULL.
UnpinBufferunpins the buffer. Read its comments.
ShowBufferUsageis the function who prints out the
hit ratething in your
ShowAllBufferDescare two debugging functions I wrote. This prints out information you will see in your
outputfolders. Read its code and understand what
buffer.logfile means. You will be using these functions in your implementation of LRU so that we can automatically test your implementation.
BufferStrategyControlis the data structure where they maintain metadata for clock algorithm. Read its comments.
BufferAccessStrategyDatais for something else. We don’t care about it. Same for anything related with buffer ring.
StrategyGetBufferis very important. It tries to get a free buffer, or run the replacement algorithm if there isn’t any free ones. Again, we consider the normal case, where
strategy == NULL. Read this function thoroughly.
StrategyFreeBufferis also important. It puts the buffer into free list. Consider how to do that in LRU.
StrategyInitializeis to initialize their clock algorithm metadata. Read its comments. You may need to modify this to initialize your LRU metadata.