B-Tree in database

Mohamed Arif
3 min readDec 14, 2023

--

B-Trees can be thought of as a vast catalog room in the library: you first have to pick the correct cabinet, then the correct shelf in that cabinet, then the correct drawer on the shelf, and then browse through the cards in the drawer to find the one you’re searching for. Similarly, a B-Tree builds a hierarchy that helps to navigate and locate the searched items quickly

Since B-tree is sorted, It helps us in search operations , we could use binary search for efficient searching. It improves both point and range queries.

B-Tree has three main components:

  1. Root node
  2. Internal node
  3. Leaf node

Root node doesn’t have any parent it is at the top of the tree.

Internal node stores the pointer to connection nodes which connects with leaves. There is usually more than one level.

These are the bottom layer nodes that have no child nodes.

Considering its efficiency:

B-Trees combine these ideas: increase node fanout, and reduce tree height, the number of node pointers, and the frequency of balancing operations.

A B-tree node will have n keys and n+1 pointers to subsequent subtrees until the leaf nodes. As you can see in the above image left node will have values lesser than the key and the extreme right node has values greater than the key.

From the perspective of number of comparisons, the logarithm base is 2, since searching a key inside each node is done using binary search. Every comparison halves the search space, so complexity is log2 M.

B-Tree node split and merge:

When a B-tree node becomes full and requires splitting,

  1. Allocate new node: Create a new node to hold half the overflowing elements.
  2. Transfer elements: Move half the elements from the splitting node to the new node.
  3. Promote key: Extract the median key from the splitting node and insert it into the parent node along with a pointer to the new node.
  4. Update parent: Adjust the parent node’s key and child pointers to reflect the split.
  5. Propagate split: If the parent node is also full, recursively split it up to the root.

Merge
B-trees handle underflow situations, where neighbouring nodes have too few values, through merging. Here’s a concise breakdown:

Merging scenario:

Two leaf nodes are merged if:

  • A single node can hold N key-value pairs (maximum capacity).
  • The combined number of key-value pairs in both nodes is <= N.
  • Two non-leaf nodes are merged if:
  • A single node can hold N+1 pointers (maximum capacity).
  • The combined number of pointers in both nodes is <= N+1.

Merging process:

  1. Identify merging candidates: Find two adjacent nodes with insufficient key-value pairs/pointers (underflow threshold).
  2. Combine contents:
  • For leaf nodes: Concatenate the key-value pairs of both nodes.
  • For non-leaf nodes: Concatenate the pointers of both nodes and adjust the parent node’s keys accordingly.

3. Remove empty node: Delete the empty node after transferring its contents.

4. Rebalance parent : If the parent node becomes underweight, adjust its key and pointer distribution or merge it further up the tree.

If the parent node is full and does not have space available for the promoted key and pointer to the newly created node, it has to be split as well. This operation might propagate recursively all the way to the root.

--

--

No responses yet