Module 06 — Query Processing
Track: Database Internals — Orbital Object Registry
Position: Module 6 of 6
Source material: Database Internals — Alex Petrov, Chapters 14–15; Designing Data-Intensive Applications — Martin Kleppmann, Chapter 6
Quiz pass threshold: 70% on all three lessons to unlock the project
Mission Context
SDA INCIDENT REPORT — OOR-2026-0047
Classification: PERFORMANCE DEFICIENCY
Subject: Multi-source TLE merge exceeds conjunction window deadlineThe OOR ingests TLE data from 5 independent sources (18th SDS, ESA SST, LeoLabs, ExoAnalytic, Numerica). When multiple sources provide TLEs for the same object, the engine must merge them — selecting the most recent epoch, resolving conflicts, and joining against the master catalog. Currently this is done in application code with ad-hoc nested loops. A full catalog merge of 100,000 objects from 5 sources takes 45 seconds. The conjunction pipeline requires merge results within 10 seconds.
Directive: Implement a structured query processing layer: scan operators, join algorithms, and a composable execution model that can be optimized for the catalog merge workload.
Learning Outcomes
After completing this module, you will be able to:
- Implement the volcano (iterator) model for pull-based query execution with composable operators
- Explain vectorized execution and why processing column batches outperforms row-at-a-time for analytical queries
- Implement nested-loop, hash, and sort-merge join algorithms and determine which is optimal for a given workload
- Compose scan, filter, projection, and join operators into a query execution plan
- Analyze the I/O and memory costs of different join strategies for the OOR catalog merge workload
Lesson Summary
Lesson 1 — The Volcano (Iterator) Model
Pull-based query execution. Each operator (scan, filter, join) implements next() → Option<Row>. Operators compose like iterator chains. Pipelining and its limitations.
Lesson 2 — Vectorized Execution
Processing batches of rows (column vectors) instead of one row at a time. Cache efficiency, SIMD potential, and why OLAP engines (DuckDB, Velox, DataFusion) use vectorized execution.
Lesson 3 — Join Algorithms
Nested-loop join, hash join, and sort-merge join. Cost models, memory requirements, and when each algorithm is optimal. Application to the OOR multi-source TLE merge.
Capstone Project — Orbital Catalog Merge System
Build a query execution engine that merges TLE data from 5 sources using composable operators. The merge pipeline uses scan, filter, sort, and join operators composed in the volcano model. Full brief in project-catalog-merge.md.
File Index
module-06-query-processing/
├── README.md
├── lesson-01-volcano-model.md
├── lesson-01-quiz.toml
├── lesson-02-vectorized-execution.md
├── lesson-02-quiz.toml
├── lesson-03-join-algorithms.md
├── lesson-03-quiz.toml
└── project-catalog-merge.md
Prerequisites
- Module 5 (Transactions & Isolation) completed
- All previous modules in the Database Internals track
Track Complete
This is the final module of the Database Internals track. After completing it, you will have built a storage engine from the ground up: page layout (M1) → B-tree indexing (M2) → LSM write-optimized storage (M3) → crash recovery (M4) → MVCC concurrency (M5) → query processing (M6).