Dandandan commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2041413480
##
datafusion/physical-plan/src/sorts/sort_filters.rs:
##
@@ -0,0 +1,297 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributo
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015101716
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -295,3 +350,104 @@ fn create_initial_plan(
// default to scanning all row groups
Ok(ParquetAccessP
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022020818
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1197,35 +1197,55 @@ impl ExecutionPlan for SortExec {
) -> Result {
trace!("Start SortExec::ex
adriangb closed pull request #15301: Add dynamic pruning filters from TopK state
URL: https://github.com/apache/datafusion/pull/15301
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comm
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2767114440
> filter pushdown, dynamic filter = 0.738s
🤯
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020157010
##
datafusion/proto/src/physical_plan/to_proto.rs:
##
@@ -210,6 +212,7 @@ pub fn serialize_physical_expr(
value: &Arc,
codec: &dyn PhysicalExtensionCode
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022023507
##
datafusion/physical-plan/src/filter.rs:
##
@@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec {
}
try_embed_projection(projection, self)
geoffreyclaude commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024431842
##
datafusion/physical-plan/src/sorts/sort_filters.rs:
##
@@ -0,0 +1,297 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contr
suibianwanwank commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2006221885
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +737,72 @@ impl RecordBatchStore {
}
}
+struct TopKDynamicFilterSource {
+/// The Top
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2767115072
I am starting to check this PR out again in detail
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021602649
##
datafusion/common/src/config.rs:
##
@@ -590,6 +590,13 @@ config_namespace! {
/// during aggregations, if possible
pub enable_topk_aggregation: b
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021697091
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1197,35 +1197,55 @@ impl ExecutionPlan for SortExec {
) -> Result {
trace!("Start SortExec::execu
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2771480246
> FYI I will likely try and review this PR again carefully first thing
tomorrow morning
I got everything mostly nice... but it seems there are still a couple minor
bugs. Will
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021675691
##
datafusion/physical-plan/src/filter.rs:
##
@@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec {
}
try_embed_projection(projection, self)
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021679351
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1197,35 +1197,55 @@ impl ExecutionPlan for SortExec {
) -> Result {
trace!("Start SortExec::ex
suibianwanwank commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2006773415
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +737,72 @@ impl RecordBatchStore {
}
}
+struct TopKDynamicFilterSource {
+/// The Top
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025001379
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -955,7 +939,7 @@ impl TableProvider for ListingTable {
.with_output_ordering(output_ord
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2776668146
I'm closing this now massive PR in favor of splitting it up into units of
work https://github.com/apache/datafusion/issues/15512#issuecomment-2776631138
Thank you all for the
jayzhan211 commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2026105522
##
datafusion/physical-plan/src/filter.rs:
##
@@ -433,6 +437,80 @@ impl ExecutionPlan for FilterExec {
}
try_embed_projection(projection, self
2010YOUY01 commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2026090840
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1226,12 +1242,23 @@ impl ExecutionPlan for SortExec {
context.runtime_env(),
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025325826
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1067,35 +1067,53 @@ impl ExecutionPlan for SortExec {
) -> Result {
trace!("Start SortExec::ex
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2773613340
> I am trying to write some tests for filter_pushdown now, btw, as a way to
help make that a separate PR
@adriangb :
- here is what I have in mind: https://github.com/pydanti
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2773378357
I am trying to write some tests for filter_pushdown now, btw, as a way to
help make that a separate PR
--
This is an automated message from the Apache Git Service.
To respond to the
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025374040
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -955,7 +939,7 @@ impl TableProvider for ListingTable {
.with_output_ordering(output_orderi
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025340594
##
datafusion/physical-optimizer/src/filter_pushdown.rs:
##
@@ -0,0 +1,482 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contribut
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2773314727
@alamb check out
https://github.com/apache/datafusion/pull/15301/commits/ecc89f962fff6d510e57ee3844d07873727e2031#diff-05ace4c36d20453103f49749bad98864aea48680b0a4d5691d7ba5185d8ae4c
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025316592
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1067,35 +1067,53 @@ impl ExecutionPlan for SortExec {
) -> Result {
trace!("Start SortExec::execu
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2773299492
Startingt o check this one out again
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024168186
##
datafusion/datasource-parquet/src/source.rs:
##
@@ -498,6 +489,7 @@ impl FileSource for ParquetSource {
reorder_filters: self.reorder_filters(),
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025034186
##
datafusion/core/tests/parquet/mod.rs:
##
@@ -1072,3 +1080,364 @@ async fn make_test_file_page(scenario: Scenario,
row_per_page: usize) -> NamedTe
writer.
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2025001379
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -955,7 +939,7 @@ impl TableProvider for ListingTable {
.with_output_ordering(output_ord
2010YOUY01 commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024728505
##
datafusion/physical-optimizer/src/filter_pushdown.rs:
##
@@ -0,0 +1,190 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contrib
2010YOUY01 commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024738389
##
datafusion/physical-plan/src/execution_plan.rs:
##
@@ -467,8 +467,74 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
) -> Result>> {
geoffreyclaude commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2772149768
@adriangb looks super promising, especially as it paves the way for general
dynamic filtering!
I tested out your branch to see the overlap with
https://github.com/apach
geoffreyclaude commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024591484
##
datafusion/core/tests/parquet/mod.rs:
##
@@ -1072,3 +1080,364 @@ async fn make_test_file_page(scenario: Scenario,
row_per_page: usize) -> NamedTe
w
geoffreyclaude commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024430826
##
datafusion/physical-plan/src/sorts/sort_filters.rs:
##
@@ -0,0 +1,297 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contr
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2770508515
FYI I will likely try and review this PR again carefully first thing
tomorrow morning
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2023195132
##
datafusion/datasource-parquet/src/source.rs:
##
@@ -349,11 +337,13 @@ impl ParquetSource {
}
/// Optional reference to this parquet scan's pruning pre
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2023197647
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1224,6 +1245,28 @@ impl ExecutionPlan for SortExec {
.with_preserve_partitioning(self.prese
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2023134929
##
datafusion/physical-plan/src/sorts/sort_filters.rs:
##
@@ -0,0 +1,236 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor
ctsk commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022551332
##
datafusion/physical-plan/src/sorts/sort_filters.rs:
##
@@ -0,0 +1,236 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
ctsk commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022327718
##
datafusion/physical-plan/src/sorts/sort_filters.rs:
##
@@ -0,0 +1,236 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
suibianwanwank commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022419875
##
datafusion/physical-plan/src/sorts/sort_filters.rs:
##
@@ -0,0 +1,236 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contr
ctsk commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022274853
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -186,6 +235,90 @@ impl TopK {
Ok(())
}
+fn calculate_dynamic_filters(
+thresholds: Vec,
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022103675
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -186,6 +235,90 @@ impl TopK {
Ok(())
}
+fn calculate_dynamic_filters(
+thresholds:
suibianwanwank commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2767929587
>And follow the model of evaluation for
[Literal](https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/expressions/literal.rs)
>
>Except you would
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021746142
##
datafusion/core/src/datasource/physical_plan/parquet.rs:
##
@@ -1769,13 +1775,13 @@ mod tests {
let sql = "select * from base_table where name='test02'";
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021737787
##
datafusion/physical-plan/src/filter.rs:
##
@@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec {
}
try_embed_projection(projection, self)
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021737787
##
datafusion/physical-plan/src/filter.rs:
##
@@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec {
}
try_embed_projection(projection, self)
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021713556
##
datafusion/physical-plan/src/filter.rs:
##
@@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec {
}
try_embed_projection(projection, self)
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021674359
##
datafusion/datasource-parquet/src/source.rs:
##
@@ -349,11 +337,13 @@ impl ParquetSource {
}
/// Optional reference to this parquet scan's pruning
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021619914
##
datafusion/datasource-parquet/src/source.rs:
##
@@ -349,11 +337,13 @@ impl ParquetSource {
}
/// Optional reference to this parquet scan's pruning pre
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020151449
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1224,6 +1245,28 @@ impl ExecutionPlan for SortExec {
.with_preserve_partitioning(self.prese
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020157010
##
datafusion/proto/src/physical_plan/to_proto.rs:
##
@@ -210,6 +212,7 @@ pub fn serialize_physical_expr(
value: &Arc,
codec: &dyn PhysicalExtensionCode
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020156176
##
datafusion/physical-optimizer/src/pruning.rs:
##
@@ -527,6 +529,7 @@ impl PruningPredicate {
/// See the struct level documentation on [`PruningPredicate`
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020153780
##
datafusion/physical-plan/src/dynamic_filters.rs:
##
@@ -0,0 +1,226 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor li
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020157010
##
datafusion/proto/src/physical_plan/to_proto.rs:
##
@@ -210,6 +212,7 @@ pub fn serialize_physical_expr(
value: &Arc,
codec: &dyn PhysicalExtensionCode
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020153605
##
datafusion/physical-expr-common/src/physical_expr.rs:
##
@@ -283,6 +284,47 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug +
DynEq + DynHash {
/
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020152758
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -186,6 +235,90 @@ impl TopK {
Ok(())
}
+fn calculate_dynamic_filters(
+thresholds:
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2763043386
@alamb I've achieved 2/3 goals:
- I added wrapping of a `DynamicFilterSource` in a `PhysicalExpr` such that
it can dynamically update itself to prune rows using filter pushdown _e
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2762327907
I also wrote up some notes here:
https://github.com/apache/datafusion/issues/15037#issuecomment-2762326990
--
This is an automated message from the Apache Git Service.
To respond to
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2762301099
From discussion with Andrew here are a couple notes:
- The most granular update frequency of the filters is when the TopK itself
updates, so we should switch from polling to pushi
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2006767851
##
datafusion/core/src/datasource/physical_plan/parquet.rs:
##
@@ -1655,4 +1656,46 @@ mod tests {
assert_eq!(calls.len(), 2);
assert_eq!(calls,
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2004247883
##
datafusion/common/src/config.rs:
##
@@ -590,6 +590,9 @@ config_namespace! {
/// during aggregations, if possible
pub enable_topk_aggregation:
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761509767
> I would still keep the methods on ExecutionPlan to do the pushdown instead
of the optimizer rule unless I'm wrong about optimizer rules not being able to
deal with LiquidCacheClientE
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761489660
Thank you very much @adriangb -- given the new (warranted) complexity this
feature is likely to add to DataFusion, and the fact if done right it can serve
as the foundation for many a
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761504336
> Things that serialize a PhysicalExpr across the wire, e.g.
https://github.com/XiangpengHao/liquid-cache does it via
[serialize_physical_expr](https://github.com/apache/datafusion/blo
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2760349060
I created https://github.com/pydantic/datafusion/pull/13/files to discuss
that idea further. It's promising in some ways but also has some issues, I left
TODOs and comments.
--
T
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2760193801
> I suppose we could make a physical filter pushdown plan starting with the
TopK
To be clear my point is: if instead of `ParquetSource` I have
[LiquidCacheClientExec](https:/
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2736411496
I think this is just part of the picture. To fully match DuckDB we'd have to
do something like the rewrite proposed in
https://github.com/apache/datafusion/issues/15177#issuecomment
suibianwanwank commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2760078487
> Another concern with a dynamic physicalexpr: more lock contention.
Presumably every time it's evaluated (for each row?) we need to acquire a lock
to read from the TopK heap.
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759867166
> Hmm interesting about the join conditions. I have not thought about how
this will interact with a join in the same query.
>
> Another concern with a dynamic physicalexpr: more
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759578009
Regarding waiting for filter pushdown by default: I feel like that shouldn't
be a blocker given that this improves performance without it (albeit by not as
much) and that many folks
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759576175
Hmm interesting about the join conditions. I have not thought about how this
will interact with a join in the same query.
Another concern with a dynamic physicalexpr: more loc
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759408768
> What would the advantages of doing this with a PhysExpr be? The main one I
can think of is avoiding introducing the new DynamicFilterSource trait.
I guess I was thinkin
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759391428
Also I will redouble my efforts to try and focus to get filter pushdown in
by default
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759387442
Just because I edited my comment around the time you replied, see point
about serializability as well.
What would the advantages of doing this with a PhysExpr be? The main one
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017541733
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is us
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759361675
> My main concern is that regardless of what phase (optimizer or runtime)
the pushdown happens if what we push down is a custom PhysicalExpr we're going
to have trouble making that pla
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2012937304
##
datafusion/core/src/datasource/physical_plan/parquet.rs:
##
@@ -1847,6 +1848,28 @@ mod tests {
writer.close().unwrap();
}
+fn write_file_null(
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017347609
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017347609
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2758981967
Thank you all for the feedback.
@alamb I think the main thing to figure out is if this should happen during
an optimizer pass and what gets pushed down is an `Arc` or if
we c
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017257698
##
datafusion/common/src/config.rs:
##
@@ -590,6 +590,9 @@ config_namespace! {
/// during aggregations, if possible
pub enable_topk_aggregation:
suibianwanwank commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017284381
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operat
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
AdamGS commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016437453
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -163,26 +187,32 @@ impl TopK {
// TODO make this algorithmically better?:
// Idea: filter out r
AdamGS commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016437453
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -163,26 +187,32 @@ impl TopK {
// TODO make this algorithmically better?:
// Idea: filter out r
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016489779
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +737,159 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
YjyJeff commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015964312
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +737,159 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015100172
##
datafusion/datasource-parquet/src/source.rs:
##
@@ -259,6 +261,8 @@ pub struct ParquetSource {
pub(crate) metrics: ExecutionPlanMetricsSet,
/// Optio
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015111827
##
datafusion/datasource-parquet/src/source.rs:
##
@@ -587,4 +578,17 @@ impl FileSource for ParquetSource {
}
}
}
+
+fn supports_dy
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015103237
##
datafusion/core/src/datasource/physical_plan/parquet.rs:
##
@@ -1847,6 +1848,28 @@ mod tests {
writer.close().unwrap();
}
+fn write_file_nu
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015098643
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is
alamb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015083952
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -644,10 +738,122 @@ impl RecordBatchStore {
}
}
+/// Pushdown of dynamic fitlers from TopK operators is us
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2754554854
> Thank you @adriangb, this looks very exciting. I'd also like to review
this in detail, especially from the perspective of API applicability for join
filter pushdowns(like starburs
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2752636475
Overall benchmark results show basically no change
Details
```
Benchmark clickbench_partitioned.json
┏
1 - 100 of 124 matches
Mail list logo