Terminologies

File
A collection of pages, each containing a number of records. This is not the same as an operating system file.
Metadata
Data about data. There are metadata for relations, indices, views, stats, admins, buffer pool, etc. Catalog itself is stored as relations.
Record ID
Usually a combination of page id and some sort of offset in page. e.g. page-id, slot number
Index
An index contains a collection of data entries, and supports efficient retrieval of all records with a given search key value $k$. DBMS can have multiple (different) indices per file.
Search Key
Any subset of the fields of a relation can be the search key for an index on the relation. Search key is not the same as (primary) key. Search keys don’t have to be unique.

Things to Consider

  • how to map logical records to their physical representation (record format)
  • how to organize records on a page (page format)
  • how to organize collections of pages (file format, indexing)
  • what metadata is needed and where should it be stored?
    • example: the types and lengths of the fields
    • may need both per-record and per-collection metadata

Record Formats

Side Note: Consider how to represent null values.

Fixed Length

  • All records of the this relation are of the same size. Metadata of record format is stored in system catalogs.
  • Easy to find $n$-th element.

Variable Length

Fields delimited by special symbols

  • Need to scan from start to find $n$-th field
  • No need to maintain per-record metadata
  • If a field size changed, need to move data that comes after that field.

Prepend the whole record with an array of field offsets

  • Offer direct access to $n$-th field through header lookup.
  • Record header is a overhead.
  • Need to put end-of-record position in the header.

Prepend each field with field length

  • Not-so-direct access to $n$-th field by skipping over some fields
  • Need to shift data when some field chages size

Page Formats

Data usually starts from the beginning of a page. Metadata (e.g. number of records on page, record offsets, etc) usually starts from the end of a page. Freespace is in between. This allow data and metadata to grow freely without having to move each other until no more free space.

Please compare this model to process memory model where heap grows downwards, stack grows upwards, leaving freespace in between.

For Fixed Length Records

Packed Page

  • Records are packed/adjacent.
  • Only number of records within page is needed.
  • To pack the page, records may be moved during reorg. (Thus causing Record ID to change.)

Unpacked Page

  • Records are scattered.
  • Has to use a bitmap to mark occupied slots within page, plus a counter for number of records.

For Variable Length Records

  • Metadata part maintains number of records, <offset, length> for all records in the page, and start address of free space.

File Formats

Heap Files

Unordered. Fine for file scan retrieving all records. Easy to maintain.

  • Can be implemented as list, tree or directory.
  • Quick update, slow search.

List Implementation

  • Typically contains a header page, a full page list, and a list of pages with free space. They are doublely linked.

Page Directory Implementation

  • Typically contains a list of header pages, with each header page contains a directroy of pages.

Sorted Files

Best for retrieval in search key order, or if only a range of records is needed. Expensive to maintain.

  • Slow update, quick search on sort key.
  • Can only sort by a single search key.

Clustered Files

See index below.

Index

  • Selection method supported (equality vs range)
  • Data entry representation
    • actual data: index-organized file, index and data are in same file
    • [<key, rid>, ...], index and data are in separate files
    • <key, [rids ...]>, index and data are in separate files
  • Clustered vs unclustered
    • clustered/primary index: order of index entries in index files is almost the same as order of records in data files
    • unclustered/secondary index: order of index entries in index files is not related to order of records in data files
  • Single key vs composite key
  • Tree-based vs hash-based

B Tree

A B tree of order $n=2m+1$ is a tree in which each node has:

  • at most $2m$ entries (and, for internal nodes, $2m + 1$ children)
  • at least $m$ entries (and, for internal nodes, $m + 1$ children)
  • exception: the root node may have as few as 1 entry

B tree has perfect balance: all paths from the root node to a leaf node have the same length. In DBMS indices, every node in B Tree is correspond to a page in the file.

This interactive B-Tree demo is very useful. Please check this out. You can slow down the animation speed using the sliding bar at the bottom.

search(key, node) {
	if (node == null) 
		return null;

	i = 0;
	while (i < node.numkeys && node.k[i] < key)
		i++;

	if (i == node.numkeys || node.k[i] != key)
		return search(key, child[i]);
	else // node.k[i] == key
		return data[i];
}

Insertion

Algorithm for inserting an entry with a key $k$:

  • search for $k$, which will bring you to a leaf node (ignoring duplicates for now)
    • if the leaf node has fewer than $2m$ entries, add the new entry to the leaf node
    • else split the node, dividing up the $2m + 1$ entries:
      • the first/smallest $m$ entries remain
      • the last/largest $m$ entries go in a new node
      • send the middle entry up and insert it (and a pointer to the new node) in the parent
      • split parent as needed

Deletion

Algorithm for deleting an entry with a key $k$:

  • search for $k$, which will reach some node (ignoring duplicates, and suppose $k$ exists)
    • if $k$ is in a leaf node, then delete it.
    • if $k$ is in an internal node, then replace it with its in-order successor or predecessor $j$ (which is either the smallest from right subtree, or largest from left subtree, and it is guaranteed to be in a leaf node), and delete $j$ from that leaf node.
    • for both cases, if the affected leaf node now has less than $m$ entries,
      • if right sibling has more than $m$ entries, rotate left
      • or if left sibling has more than $m$ entries, rotate right
      • otherwise, merge with one sibling, and pull the separator entry from the parent into merged node.
      • rebalance parents as needed

B+ Tree

Again, a wonderful interactive demo here.

It is a varient of B Tree, where internal nodes only have keys (instead of key-data pairs), and an additional level (leaf level) is added for key-data pairs. Also, leaves are linked for faster linear scan. This allow internal nodes to store more keys, thus requiring less disk I/O in general.

Insertion

  • search for $k$, which will bring you to a leaf node (ignoring duplicates for now)
    • if the leaf node has fewer than $2m$ entries, add the new entry to the leaf node
    • else split the node, dividing up the $2m + 1$ entries:
      • the first/smallest $m$ entries remain
      • the last/largest $m+1$ entries go in a new node, that is to keep the middle entry in new node (can also keep in old node, but need to change search/deletion accordingly.)
      • send a copy of the middle entry up and insert it (and a pointer to the new node) in the parent
      • split parent as in B Tree (that is, without keeping some middle entry)

Deletion

  • search for $k$, which will reach a leaf node (ignoring duplicates, and suppose $k$ exists)
    • delete it, update parents (up to root) if $k$ is the smallest in the node
    • if the leaf node now has less than $m$ entries,
      • if right sibling has more than $m$ entries, borrow from right, and update separator in parents
      • or if left sibling has more than $m$ entries, borrow from left, and update separator in parents
      • otherwise, merge with one sibling, and delete the corresponding separator entry from the parent
      • rebalance parents as in B Tree (that is, not bother with “middle entry”)

Exercises

Problem 1

Insert the following sequence into a B+ Tree of order 3

1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42

Draw the resulting B+ Tree

Problem 2

Upon the tree you get in Problem 1, delete the following sequence

20, 31, 21, 42

Draw the resulting B+ Tree

Solutions

Please check the solution using the interactive demo.