xudong963 commented on code in PR #155:
URL: https://github.com/apache/datafusion-site/pull/155#discussion_r2934568462
##########
content/blog/2026-03-10-limit-pruning.md:
##########
@@ -0,0 +1,299 @@
+---
+layout: post
+title: Turning LIMIT into an I/O Optimization: Inside DataFusion’s Multi-Layer
Pruning Stack
+date: 2026-03-10
+author: xudong
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+ margin: 20px 0;
+}
+
+figure img {
+ display: block;
+ max-width: 80%;
+ margin: auto;
+}
+
+figcaption {
+ font-style: italic;
+ color: #555;
+ font-size: 0.9em;
+ max-width: 80%;
+ margin: auto;
+ text-align: center;
+}
+</style>
+
+*Xudong Wang, [Massive](https://www.massive.com/)*
+
+Reading data efficiently means touching as little data as possible. The
fastest I/O is the I/O you never make. This sounds obvious, but making it
happen in practice requires careful engineering at every layer of the query
engine. [Apache DataFusion] achieves this through a multi-layer **pruning
pipeline** — a series of stages that progressively narrow down the data before
decoding a single row.
+
+In this post, we describe a new optimization called **limit pruning** that
makes this pipeline aware of SQL `LIMIT` clauses. By identifying row groups
where *every* row is guaranteed to match the predicate, DataFusion can satisfy
a `LIMIT` query without ever touching partially matching row groups —
eliminating wasted I/O entirely.
+
+This work was inspired by the "Pruning for LIMIT Queries" section of
Snowflake's paper [*Pruning in Snowflake: Working Smarter, Not
Harder*](https://arxiv.org/pdf/2504.11540).
+
+## DataFusion's Pruning Pipeline
+
+Before diving into limit pruning, let's understand the full pruning pipeline.
DataFusion scans Parquet data through a series of increasingly fine-grained
filters, each one eliminating data so the next stage processes less:
+
+<figure>
+<img src="/blog/images/limit-pruning/pruning-phases.svg" width="80%"
alt="Three phases of DataFusion's pruning pipeline"/>
+<figcaption>Figure 1: The three phases of DataFusion's pruning pipeline — from
directories down to individual rows.</figcaption>
+</figure>
+
+### Phase 1: High-Level Discovery
+
+- **Partition Pruning**: The `ListingTable` component evaluates filters that
depend only on partition columns — things like `year`, `month`, or `region`
encoded in directory paths (e.g., `s3://data/year=2024/month=01/`). Irrelevant
directories are eliminated before we even open a file.
+- **File Stats Pruning**: The `FilePruner` checks file-level min/max and
null-count statistics. If these statistics prove that a file cannot satisfy the
predicate, we drop it entirely — no need to read row group metadata.
+
+### Phase 2: Row Group Statistics
+
+For each surviving file, DataFusion reads row group metadata and classifies
each row group into one of three states:
+
+<figure>
+<img src="/blog/images/limit-pruning/row-group-states.svg" width="80%"
alt="Row group classification: not matching, partially matching, fully
matching"/>
+<figcaption>Figure 2: Row groups are classified into three states based on
their statistics.</figcaption>
+</figure>
+
+- **Not Matching (Skipped)**: Statistics prove no rows can match. The row
group is ignored completely.
+- **Partially Matching**: Statistics cannot rule out matching rows, but also
cannot guarantee them. These groups might be scanned and verified row by row
later.
+- **Fully Matching**: Statistics prove that *every single row* in the group
satisfies the predicate. This state is key to making limit pruning possible.
+
+Additionally, **bloom filters** could eliminate row groups for equality and
`IN`-list predicates at this stage.
+
+### Phase 3: Granular Pruning
+
+The final phase goes even deeper:
+
+- **Page Index Pruning**: Parquet pages have their own min/max statistics.
DataFusion uses these to skip individual data pages within a surviving row
group.
+- **Late Materialization (Row Filtering)**: Instead of decoding all columns at
once, DataFusion decodes the cheapest, most selective columns first. It filters
rows using those columns, then only decodes the remaining columns for surviving
rows.
+
+## The Problem: LIMIT Was Ignored
+
+Before limit pruning, all of these stages worked well — but the pruning
pipeline had **no awareness of `LIMIT`**. Consider a query like:
+
+```sql
+SELECT * FROM tracking_data
+WHERE species LIKE 'Alpine%' AND s >= 50
+LIMIT 3
+```
+
+Even when fully matched row groups alone contain enough rows to satisfy the
`LIMIT`, the scan would still visit partially matching groups — decoding data
that might contribute zero qualifying rows.
+
+<figure>
+<img src="/blog/images/limit-pruning/wasted-io.svg" width="80%"
alt="Traditional pruning decodes partially matching groups with no LIMIT
awareness"/>
+<figcaption>Figure 3: Without limit awareness, partially matching groups are
scanned even when fully matched groups already have enough rows.</figcaption>
+</figure>
+
+If five fully matched rows in a fully matched group already satisfy `LIMIT 5`,
why bother decoding groups where we're not even sure any rows qualify?
+
+## The Solution: Limit-Aware Pruning
+
+The solution adds a new step in the pruning pipeline — right after row group
pruning and before page index pruning:
+
+<figure>
+<img src="/blog/images/limit-pruning/pruning-pipeline.svg" width="80%"
alt="Pruning pipeline with limit pruning highlighted"/>
+<figcaption>Figure 4: Limit pruning is inserted between row group and page
index pruning.</figcaption>
+</figure>
+
+The idea is simple: **if fully matched row groups already contain enough rows
to satisfy the `LIMIT`, rewrite the access plan to scan only those groups and
skip everything else.**
+
+This optimization is applied only when the query is a pure limit query with no
`ORDER BY`, because reordering which groups we scan could change the output
ordering of the results. In the implementation, this check is expressed as:
+
+```rust
+// Prune by limit if limit is set and order is not sensitive
+if let (Some(limit), false) = (limit, preserve_order) {
+ row_groups.prune_by_limit(limit, rg_metadata, &file_metrics);
+}
+```
+
+## Mechanism: Detecting Fully Matched Row Groups
+
+The core insight is **predicate negation**. To determine if every row in a row
group satisfies the predicate, we:
Review Comment:
+1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]