Project — B+ Tree TLE Index Engine
Module: Database Internals — M02: B-Tree Index Structures
Track: Orbital Object Registry
Estimated effort: 6–8 hours
- SDA Incident Report — OOR-2026-0043
- Objective
- Acceptance Criteria
- Starter Structure
- Hints
- What Comes Next
SDA Incident Report — OOR-2026-0043
Classification: ENGINEERING DIRECTIVE
Subject: Build ordered index for NORAD catalog ID lookups and range scansThe page manager from Module 1 stores TLE records but requires full scans for key-based access. Build a B+ tree index over NORAD catalog IDs that provides O(log N) point lookups and efficient range scans via a leaf-level linked list. The index must integrate with the existing page manager and buffer pool.
Objective
Build a B+ tree index that:
- Uses the page manager and buffer pool from Module 1 — each B+ tree node is stored as a page
- Supports point lookups by NORAD catalog ID in O(log N) page reads
- Supports range scans over NORAD ID ranges using the leaf-level linked list
- Handles inserts with automatic node splitting and split propagation
- Handles deletes with tombstoning (lazy rebalancing is acceptable)
- Provides a bulk-load operation for initial catalog construction
Acceptance Criteria
-
Point lookup correctness. Insert 10,000 TLE records with random NORAD IDs. Look up each by ID and verify the returned Record ID matches.
-
Range scan correctness. Insert NORAD IDs 1–10,000. Scan range [5000, 5100] and verify exactly 101 records returned in sorted order.
-
Split handling. Insert records until at least 3 leaf splits occur. Verify the tree remains balanced (all leaves at same depth) and all records retrievable.
-
Bulk-load efficiency. Bulk-load 100,000 sorted records. Verify leaf fill factor above 95%. Compare leaf count to one-by-one insertion.
-
Delete correctness. Delete 1,000 records by NORAD ID. Verify lookups return
Nonefor deleted keys and remaining records are unaffected. -
Integration with buffer pool. Run full test suite with only 16 buffer pool frames. Verify correctness under eviction pressure.
-
Deterministic output. Print tree height, leaf count, fill factor, and buffer pool hit/miss stats after each test phase.
Starter Structure
btree-index/
├── Cargo.toml
├── src/
│ ├── main.rs # Entry point: runs acceptance criteria
│ ├── btree.rs # BPlusTree: insert, lookup, range_scan, delete, bulk_load
│ ├── node.rs # InternalNode, LeafNode: serialization, split, merge
│ ├── page.rs # Reuse from Module 1
│ ├── buffer_pool.rs # Reuse from Module 1
│ ├── page_file.rs # Reuse from Module 1
│ └── tle.rs # Reuse from Module 1
Hints
Hint 1 — Node serialization format
Use a 1-byte discriminant at the start of each node page to distinguish internal from leaf. Internal: [type=0][key_count][child_0][key_0][child_1].... Leaf: [type=1][key_count][next_leaf][prev_leaf][key_0][rid_0]....
Hint 2 — Ancestor path for split propagation
During root-to-leaf traversal, push each internal node's page ID onto a Vec<u32>. After a leaf split, pop ancestors one at a time to insert the promoted key. If the ancestor splits too, continue popping. Empty stack = create new root.
Hint 3 — Bulk-load algorithm
- Sort all records by NORAD ID.
- Pack keys into leaf nodes at capacity, write each, record its page ID and last key.
- Build internal nodes bottom-up: group separators into internal node pages.
- Repeat step 3 until one root remains.
- Link leaves into a doubly-linked list.
Hint 4 — Buffer pool pressure during splits
Split propagation can pin the split node, the new node, and the parent simultaneously (3 frames). With a 16-frame pool and a 2-level tree, this is safe. But unpin nodes as soon as you've serialized them back — don't hold all three longer than necessary.
What Comes Next
Module 3 introduces LSM trees — a fundamentally different approach. Where B+ trees update pages in-place, LSM trees batch writes in memory and flush immutable files. You'll understand when each is appropriate for the OOR workload.