*Expanding PageZero to Support Unlimited Columns* APE: https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+23%3A+Unlimited+Columns+Support
In the columnar storage format, each MegaPage represents a logical leaf node and begins with `PageZero`, a metadata section that captures essential column metadata including column offsets and min/max filters. Originally, `PageZero` was constrained to reside in a single page (typically 128KB), with a fixed layout that stored information for **every column** in the global schema. Each column entry consumed 4 bytes for offset and 16 bytes for a min/max filter, leading to a **metadata footprint of 20 bytes per column**. With this layout, the **maximum number of columns supported was capped at ~6,000**, given space constraints and the need to reserve part of `PageZero` for primary key metadata and structural headers. This limitation became problematic for datasets with **wide or sparse schemas**, where many columns may be missing in individual document batches but still occupy space in `PageZero`. The presence of unused metadata bloated the footprint and limited scalability. *Multi-Segment PageZero: Motivation and Layout* To overcome this limitation, we introduce **multi-segment support in PageZero**. Instead of storing all metadata in a single fixed block, we partition PageZero into multiple **segments**, with the **first (zeroth) segment storing primary key metadata and as many column entries as it can fit**, and subsequent segments storing the remaining metadata. Each segment follows the same layout: column index → offset → min → max, stored in an interleaved manner. This structure ensures efficient scan and lookup, while enabling us to scale to **arbitrarily many columns**, bounded only by MegaPage size. *Segment Layout:* ``` [ Segment Header ] ├─ Number of Columns ├─ Max Column Index in Segment [ Interleaved Metadata Entries ] ├─ ColumnIndex₁, Offset₁, Min₁, Max₁ ├─ ColumnIndex₂, Offset₂, Min₂, Max₂ └─ ... ``` A new `DefaultColumnMultiPageZeroWriter` class was introduced to manage this segmented layout. It delegates metadata writing to individual segments while maintaining headers at the top-level for navigation. *Adaptive Writer Selection* To avoid burdening all batches with this segmented structure, we retain the `DefaultColumnPageZeroWriter` for small or dense schemas. A new **adaptive selection mechanism** compares space usage of both writers for a batch and picks the optimal one. The decision logic weighs: - Space taken by Default Multi-segment writer (fixed layout for all columns) - Space taken by Sparse Multi-Segment writer (compact layout for present columns) This logic is encapsulated in `PageZeroWriterFlavorSelector`. *New Configuration Options:* Two new storage configuration parameters have been introduced: 1. **`STORAGE_MAX_COLUMNS_IN_ZEROTH_SEGMENT`** (`INTEGER_BYTE_UNIT`, default: `5000`) Controls the maximum number of columns that can be stored in the zeroth segment of `PageZero`. Remaining columns, if any, are offloaded to additional segments. This helps balance lookup performance (fast for zeroth segment) and scalability. This might change based on perf experiments. 2. **`STORAGE_PAGE_ZERO_WRITER`** (`STRING`, default: `"default"`) Controls the writer strategy used during flush. Accepted values are: - `"default"`: Always use the legacy writer. - `"sparse"`: Always use the sparse writer (only present columns). - `"adaptive"`: Dynamically compare both and pick the writer that uses less space. *Summary of Changes* - Interleaved layout per segment for columnIndex, offset, min, max. - Logic to estimate the number of segments and assign columns to segments. - Writer is selected dynamically using `PageZeroWriterFlavorSelector`. *Benefits* - Unlocks support for **tens of thousands of columns** per MegaPage. - Better space efficiency for sparse batches. - Retains backward compatibility: Already ingested MegaLeafs can also be read. This change is essential for evolving workloads that increasingly rely on flexible schemas and sparse data layouts.