Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-13 Thread via GitHub
Dandandan commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2041413480 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,297 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015101716 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -295,3 +350,104 @@ fn create_initial_plan( // default to scanning all row groups Ok(ParquetAccessP

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022020818 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1197,35 +1197,55 @@ impl ExecutionPlan for SortExec { ) -> Result { trace!("Start SortExec::ex

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
adriangb closed pull request #15301: Add dynamic pruning filters from TopK state URL: https://github.com/apache/datafusion/pull/15301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2767114440 > filter pushdown, dynamic filter = 0.738s 🤯 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020157010 ## datafusion/proto/src/physical_plan/to_proto.rs: ## @@ -210,6 +212,7 @@ pub fn serialize_physical_expr( value: &Arc, codec: &dyn PhysicalExtensionCode

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022023507 ## datafusion/physical-plan/src/filter.rs: ## @@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec { } try_embed_projection(projection, self)

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
geoffreyclaude commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024431842 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,297 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
suibianwanwank commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2006221885 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +737,72 @@ impl RecordBatchStore { } } +struct TopKDynamicFilterSource { +/// The Top

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2767115072 I am starting to check this PR out again in detail -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021602649 ## datafusion/common/src/config.rs: ## @@ -590,6 +590,13 @@ config_namespace! { /// during aggregations, if possible pub enable_topk_aggregation: b

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-04 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021697091 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1197,35 +1197,55 @@ impl ExecutionPlan for SortExec { ) -> Result { trace!("Start SortExec::execu

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-04 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2771480246 > FYI I will likely try and review this PR again carefully first thing tomorrow morning I got everything mostly nice... but it seems there are still a couple minor bugs. Will

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-04 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021675691 ## datafusion/physical-plan/src/filter.rs: ## @@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec { } try_embed_projection(projection, self)

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-04 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021679351 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1197,35 +1197,55 @@ impl ExecutionPlan for SortExec { ) -> Result { trace!("Start SortExec::ex

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-04 Thread via GitHub
suibianwanwank commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2006773415 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +737,72 @@ impl RecordBatchStore { } } +struct TopKDynamicFilterSource { +/// The Top

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-04 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025001379 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -955,7 +939,7 @@ impl TableProvider for ListingTable { .with_output_ordering(output_ord

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2776668146 I'm closing this now massive PR in favor of splitting it up into units of work https://github.com/apache/datafusion/issues/15512#issuecomment-2776631138 Thank you all for the

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-03 Thread via GitHub
jayzhan211 commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2026105522 ## datafusion/physical-plan/src/filter.rs: ## @@ -433,6 +437,80 @@ impl ExecutionPlan for FilterExec { } try_embed_projection(projection, self

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
2010YOUY01 commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2026090840 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1226,12 +1242,23 @@ impl ExecutionPlan for SortExec { context.runtime_env(),

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025325826 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1067,35 +1067,53 @@ impl ExecutionPlan for SortExec { ) -> Result { trace!("Start SortExec::ex

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2773613340 > I am trying to write some tests for filter_pushdown now, btw, as a way to help make that a separate PR @adriangb : - here is what I have in mind: https://github.com/pydanti

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2773378357 I am trying to write some tests for filter_pushdown now, btw, as a way to help make that a separate PR -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025374040 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -955,7 +939,7 @@ impl TableProvider for ListingTable { .with_output_ordering(output_orderi

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025340594 ## datafusion/physical-optimizer/src/filter_pushdown.rs: ## @@ -0,0 +1,482 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2773314727 @alamb check out https://github.com/apache/datafusion/pull/15301/commits/ecc89f962fff6d510e57ee3844d07873727e2031#diff-05ace4c36d20453103f49749bad98864aea48680b0a4d5691d7ba5185d8ae4c

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025316592 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1067,35 +1067,53 @@ impl ExecutionPlan for SortExec { ) -> Result { trace!("Start SortExec::execu

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2773299492 Startingt o check this one out again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024168186 ## datafusion/datasource-parquet/src/source.rs: ## @@ -498,6 +489,7 @@ impl FileSource for ParquetSource { reorder_filters: self.reorder_filters(),

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025034186 ## datafusion/core/tests/parquet/mod.rs: ## @@ -1072,3 +1080,364 @@ async fn make_test_file_page(scenario: Scenario, row_per_page: usize) -> NamedTe writer.

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025001379 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -955,7 +939,7 @@ impl TableProvider for ListingTable { .with_output_ordering(output_ord

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
2010YOUY01 commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024728505 ## datafusion/physical-optimizer/src/filter_pushdown.rs: ## @@ -0,0 +1,190 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
2010YOUY01 commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024738389 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -467,8 +467,74 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { ) -> Result>> {

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
geoffreyclaude commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2772149768 @adriangb looks super promising, especially as it paves the way for general dynamic filtering! I tested out your branch to see the overlap with https://github.com/apach

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
geoffreyclaude commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024591484 ## datafusion/core/tests/parquet/mod.rs: ## @@ -1072,3 +1080,364 @@ async fn make_test_file_page(scenario: Scenario, row_per_page: usize) -> NamedTe w

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-02 Thread via GitHub
geoffreyclaude commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024430826 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,297 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2770508515 FYI I will likely try and review this PR again carefully first thing tomorrow morning -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2023195132 ## datafusion/datasource-parquet/src/source.rs: ## @@ -349,11 +337,13 @@ impl ParquetSource { } /// Optional reference to this parquet scan's pruning pre

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2023197647 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1224,6 +1245,28 @@ impl ExecutionPlan for SortExec { .with_preserve_partitioning(self.prese

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2023134929 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,236 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
ctsk commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022551332 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,236 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
ctsk commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022327718 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,236 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
suibianwanwank commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022419875 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,236 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
ctsk commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022274853 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -186,6 +235,90 @@ impl TopK { Ok(()) } +fn calculate_dynamic_filters( +thresholds: Vec,

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022103675 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -186,6 +235,90 @@ impl TopK { Ok(()) } +fn calculate_dynamic_filters( +thresholds:

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
suibianwanwank commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2767929587 >And follow the model of evaluation for [Literal](https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/expressions/literal.rs) > >Except you would

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021746142 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1769,13 +1775,13 @@ mod tests { let sql = "select * from base_table where name='test02'";

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021737787 ## datafusion/physical-plan/src/filter.rs: ## @@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec { } try_embed_projection(projection, self)

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021737787 ## datafusion/physical-plan/src/filter.rs: ## @@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec { } try_embed_projection(projection, self)

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021713556 ## datafusion/physical-plan/src/filter.rs: ## @@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec { } try_embed_projection(projection, self)

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021674359 ## datafusion/datasource-parquet/src/source.rs: ## @@ -349,11 +337,13 @@ impl ParquetSource { } /// Optional reference to this parquet scan's pruning

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021619914 ## datafusion/datasource-parquet/src/source.rs: ## @@ -349,11 +337,13 @@ impl ParquetSource { } /// Optional reference to this parquet scan's pruning pre

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-31 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020151449 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1224,6 +1245,28 @@ impl ExecutionPlan for SortExec { .with_preserve_partitioning(self.prese

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020157010 ## datafusion/proto/src/physical_plan/to_proto.rs: ## @@ -210,6 +212,7 @@ pub fn serialize_physical_expr( value: &Arc, codec: &dyn PhysicalExtensionCode

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020156176 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -527,6 +529,7 @@ impl PruningPredicate { /// See the struct level documentation on [`PruningPredicate`

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020153780 ## datafusion/physical-plan/src/dynamic_filters.rs: ## @@ -0,0 +1,226 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020157010 ## datafusion/proto/src/physical_plan/to_proto.rs: ## @@ -210,6 +212,7 @@ pub fn serialize_physical_expr( value: &Arc, codec: &dyn PhysicalExtensionCode

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020153605 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -283,6 +284,47 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash { /

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020152758 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -186,6 +235,90 @@ impl TopK { Ok(()) } +fn calculate_dynamic_filters( +thresholds:

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2763043386 @alamb I've achieved 2/3 goals: - I added wrapping of a `DynamicFilterSource` in a `PhysicalExpr` such that it can dynamically update itself to prune rows using filter pushdown _e

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2762327907 I also wrote up some notes here: https://github.com/apache/datafusion/issues/15037#issuecomment-2762326990 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2762301099 From discussion with Andrew here are a couple notes: - The most granular update frequency of the filters is when the TopK itself updates, so we should switch from polling to pushi

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2006767851 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1655,4 +1656,46 @@ mod tests { assert_eq!(calls.len(), 2); assert_eq!(calls,

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2004247883 ## datafusion/common/src/config.rs: ## @@ -590,6 +590,9 @@ config_namespace! { /// during aggregations, if possible pub enable_topk_aggregation:

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761509767 > I would still keep the methods on ExecutionPlan to do the pushdown instead of the optimizer rule unless I'm wrong about optimizer rules not being able to deal with LiquidCacheClientE

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761489660 Thank you very much @adriangb -- given the new (warranted) complexity this feature is likely to add to DataFusion, and the fact if done right it can serve as the foundation for many a

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761504336 > Things that serialize a PhysicalExpr across the wire, e.g. https://github.com/XiangpengHao/liquid-cache does it via [serialize_physical_expr](https://github.com/apache/datafusion/blo

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-28 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2760349060 I created https://github.com/pydantic/datafusion/pull/13/files to discuss that idea further. It's promising in some ways but also has some issues, I left TODOs and comments. -- T

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2760193801 > I suppose we could make a physical filter pushdown plan starting with the TopK To be clear my point is: if instead of `ParquetSource` I have [LiquidCacheClientExec](https:/

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2736411496 I think this is just part of the picture. To fully match DuckDB we'd have to do something like the rewrite proposed in https://github.com/apache/datafusion/issues/15177#issuecomment

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
suibianwanwank commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2760078487 > Another concern with a dynamic physicalexpr: more lock contention. Presumably every time it's evaluated (for each row?) we need to acquire a lock to read from the TopK heap.

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759867166 > Hmm interesting about the join conditions. I have not thought about how this will interact with a join in the same query. > > Another concern with a dynamic physicalexpr: more

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759578009 Regarding waiting for filter pushdown by default: I feel like that shouldn't be a blocker given that this improves performance without it (albeit by not as much) and that many folks

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759576175 Hmm interesting about the join conditions. I have not thought about how this will interact with a join in the same query. Another concern with a dynamic physicalexpr: more loc

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759408768 > What would the advantages of doing this with a PhysExpr be? The main one I can think of is avoiding introducing the new DynamicFilterSource trait. I guess I was thinkin

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759391428 Also I will redouble my efforts to try and focus to get filter pushdown in by default -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759387442 Just because I edited my comment around the time you replied, see point about serializability as well. What would the advantages of doing this with a PhysExpr be? The main one

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017541733 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is us

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759361675 > My main concern is that regardless of what phase (optimizer or runtime) the pushdown happens if what we push down is a custom PhysicalExpr we're going to have trouble making that pla

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2012937304 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1847,6 +1848,28 @@ mod tests { writer.close().unwrap(); } +fn write_file_null(

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017347609 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017347609 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2758981967 Thank you all for the feedback. @alamb I think the main thing to figure out is if this should happen during an optimizer pass and what gets pushed down is an `Arc` or if we c

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017257698 ## datafusion/common/src/config.rs: ## @@ -590,6 +590,9 @@ config_namespace! { /// during aggregations, if possible pub enable_topk_aggregation:

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
suibianwanwank commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017284381 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operat

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
AdamGS commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016437453 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -163,26 +187,32 @@ impl TopK { // TODO make this algorithmically better?: // Idea: filter out r

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
AdamGS commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016437453 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -163,26 +187,32 @@ impl TopK { // TODO make this algorithmically better?: // Idea: filter out r

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016489779 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +737,159 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
YjyJeff commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015964312 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +737,159 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015100172 ## datafusion/datasource-parquet/src/source.rs: ## @@ -259,6 +261,8 @@ pub struct ParquetSource { pub(crate) metrics: ExecutionPlanMetricsSet, /// Optio

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015111827 ## datafusion/datasource-parquet/src/source.rs: ## @@ -587,4 +578,17 @@ impl FileSource for ParquetSource { } } } + +fn supports_dy

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015103237 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1847,6 +1848,28 @@ mod tests { writer.close().unwrap(); } +fn write_file_nu

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015098643 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015083952 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is us

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2754554854 > Thank you @adriangb, this looks very exciting. I'd also like to review this in detail, especially from the perspective of API applicability for join filter pushdowns(like starburs

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-25 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2752636475 Overall benchmark results show basically no change Details ``` Benchmark clickbench_partitioned.json ┏

  1   2   >