This is an automated email from the ASF dual-hosted git repository. suxiaogang223 pushed a commit to branch codex/complex-column-predicate-stats-filtering in repository https://gitbox.apache.org/repos/asf/doris.git
commit 911b69241f2c00f381bb1cd8a7df6caf7c877fba Author: Socrates <[email protected]> AuthorDate: Thu Jun 4 02:54:27 2026 +0800 [doc](be) Mark nested parquet pruning scope complete ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: Mark the completed nested parquet predicate and pruning implementation scope, and move remaining items into explicit non-goals for this branch. ### Release note None ### Check List (For Author) - Test: Manual test - Behavior changed: No - Does this need documentation: No --- docs/complex-column-predicate-and-stats-filtering.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/complex-column-predicate-and-stats-filtering.md b/docs/complex-column-predicate-and-stats-filtering.md index 3850199698a..3d5e450e38f 100644 --- a/docs/complex-column-predicate-and-stats-filtering.md +++ b/docs/complex-column-predicate-and-stats-filtering.md @@ -258,11 +258,15 @@ DuckDB 只对非 nested primitive reader 应用 bloom filter。Doris 当前通 page index 对 repeated leaf 的 row range 语义复杂。本轮只允许 non-repeated primitive leaf。`STRUCT` 下 non-repeated primitive leaf 可以复用现有 page index range 逻辑;LIST/MAP/repeated leaf 直接跳过。 -## 7. 后续工作 +## 7. 本轮完成结论 -- 如果后续 Arrow writer 或外部 fixture 能稳定提供 bloom filter metadata,补 nested bloom pruning 的真实 parquet fixture。 -- 完整复杂 child schema change 需要 FE/table reader 提供完整 nested table mapping;file reader 仍不理解 table/global schema。 -- LIST/MAP/repeated leaf 只有在 Dremel row semantics 和 row-range 语义明确后再接入 pruning。 +本轮实现已经覆盖 `STRUCT` / nested `STRUCT` 下 primitive leaf 的行级 Expr localization、filter-only nested projection、file-layer pruning target 构造、statistics / dictionary / bloom / page index pruning 入口,以及 mapper-based nested child rename。 + +仍然不放入本轮实现范围的事项如下: + +- nested bloom pruning 真实 parquet fixture:当前 Arrow writer 头文件没有稳定的 bloom filter metadata 写入开关;如果后续 Arrow writer 或外部 fixture 能稳定提供 bloom filter metadata,再补真实文件级 fixture。 +- 完整复杂 child schema change:需要 FE/table reader 提供完整 nested table mapping;file reader 只消费 file-local mapping,不理解 table/global schema。 +- LIST/MAP/repeated leaf pruning:只有在 Dremel row semantics 和 row-range 语义明确后再接入 pruning。 ## 8. 需要避免的实现 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
