suxiaogang223 opened a new pull request, #64098: URL: https://github.com/apache/doris/pull/64098
## Summary Implements complex type predicate filtering and statistics-based file-layer pruning for nested Parquet STRUCT columns, aligning with DuckDB's nested filter semantics while respecting Doris' new parquet reader architecture. ## Changes ### Row-level Expr Localization - `struct_element(VSlotRef(parent), literal child)` chains are recognized as nested paths - Parent slot is rewritten to file-local top-level block slot while preserving `struct_element` form - Struct children are NOT registered as independent block slots ### Filter-only Nested Projection - Filter-referenced struct children are merged into the same top-level complex column's `FieldProjection.children` - Output children maintain priority order; filter-only children are appended to read projection - Filter-only children are excluded from `ColumnMapping.child_mappings` to avoid affecting table output materialization ### Nested File-layer Pruning Target - `FileColumnPredicateFilter` adds `file_child_id_path` for file-local child field-id paths - AND-semantics `struct_element(...) op literal` / `IN (...)` construct pruning hints - OR/NOT/arbitrary function subtrees are NOT extracted for pruning (safety) - Supports renamed nested children via table-to-file field-id mapping ### Parquet Leaf Resolution & Pruning - `ResolvePredicateLeafSchema()` resolves top-level or nested targets to primitive leaf schema - Row group min/max statistics pruning for nested struct primitives - Dictionary pruning for nested struct string-like columns - Bloom filter pruning via Arrow adapter for supported primitive types - Page index row range pruning for non-repeated primitive leaves only ### Test Coverage - Mapper unit tests: nested predicate filters (GT, IN_LIST, reverse comparison, deep path) - Renamed child projection via field-id mapping - Missing child and OR subtree safety (no false pruning hints) - Real Parquet fixture tests for statistics, dictionary, and page index pruning - Bloom filter unit tests via Arrow adapter ### Out of Scope (intentionally) - LIST/MAP/repeated leaf pruning - Dynamic field names or non-deterministic expressions - Real Parquet bloom filter fixture (Arrow writer lacks stable bloom metadata API) - Full complex child schema change (requires FE/table reader support) ## Related 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
