Re: [PR] [experiment] Run Comet tests in Docker [datafusion-comet]

2025-05-25 Thread via GitHub
andygrove commented on PR #1790: URL: https://github.com/apache/datafusion-comet/pull/1790#issuecomment-2908051424 This approach seems to work, and the Spark 4 tests ran a little faster than usual. ``` Run completed in 1 hour, 10 minutes, 53 seconds. Total number of tests run:

Re: [I] Spark-compatible CAST operation [datafusion]

2025-05-25 Thread via GitHub
logan-keede commented on issue #11201: URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2908018288 do we just need to port [cast](https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/conversion_funcs/cast.rs) here from comet? -- This is an automa

Re: [PR] Chore: Moved strings expressions to separate file [datafusion-comet]

2025-05-25 Thread via GitHub
codecov-commenter commented on PR #1792: URL: https://github.com/apache/datafusion-comet/pull/1792#issuecomment-2908024355 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1792?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add test for ordering of predicate pushdown into parquet [datafusion]

2025-05-25 Thread via GitHub
kosiew commented on PR #16169: URL: https://github.com/apache/datafusion/pull/16169#issuecomment-2908327917 hi @adriangb , Thanks for the ping. I recorded some observation and thoughts in #16188 -- This is an automated message from the Apache Git Service. To respond to the m

[I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-05-25 Thread via GitHub
kosiew opened a new issue, #16188: URL: https://github.com/apache/datafusion/issues/16188 ### Is your feature request related to a problem or challenge? The current filter pushdown APIs in DataFusion (FilterPushdownPropagation, PredicateSupports, etc.) have grown organically but now a

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-25 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106428855 ## datafusion/core/tests/integration_tests/schema_adapter_integration_tests.rs: ## @@ -0,0 +1,197 @@ +// Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relation [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106233488 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1089,7 +1089,11 @@ impl OptimizerRule for PushDownFilter { let (volatile_filters, no

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relation [datafusion]

2025-05-25 Thread via GitHub
logan-keede commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106240915 ## datafusion/sql/src/planner.rs: ## @@ -235,18 +235,27 @@ impl PlannerContext { } // Return a reference to the outer query's schema -pub fn ou

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relation [datafusion]

2025-05-25 Thread via GitHub
logan-keede commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106240774 ## datafusion/sql/src/planner.rs: ## @@ -235,18 +235,27 @@ impl PlannerContext { } // Return a reference to the outer query's schema -pub fn ou

[PR] chore: Moved strings expressions to separate file [datafusion-comet]

2025-05-25 Thread via GitHub
kazantsev-maksim opened a new pull request, #1792: URL: https://github.com/apache/datafusion-comet/pull/1792 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1330 Closes #. ## Rationale for this change See https://github.

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106333691 ## datafusion/sqllogictest/test_files/subquery.slt: ## @@ -1482,3 +1482,85 @@ logical_plan statement count 0 drop table person; + +# correlated_recursive_scala

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106333691 ## datafusion/sqllogictest/test_files/subquery.slt: ## @@ -1482,3 +1482,85 @@ logical_plan statement count 0 drop table person; + +# correlated_recursive_scala

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-25 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106445116 ## datafusion/core/tests/test_source_adapter_tests.rs: ## @@ -0,0 +1,233 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-25 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106439691 ## datafusion/core/tests/test_adapter_updated.rs: ## @@ -0,0 +1,201 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-25 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106439691 ## datafusion/core/tests/test_adapter_updated.rs: ## @@ -0,0 +1,201 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106485055 ## datafusion/sqllogictest/test_files/subquery.slt: ## @@ -1482,3 +1482,85 @@ logical_plan statement count 0 drop table person; + +# correlated_recursiv

[PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-25 Thread via GitHub
hendrikmakait opened a new pull request, #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860 Part of #1758 This PR adds support for the `|> TABLESAMPLE ...` pipe operator. Open question: [BigQuery's `TABLESAMPLE` operator](https://cloud.google.com/big

Re: [I] Return an error instead of panic in `ByteGroupValueBuilder::do_append_val_inner` [datafusion]

2025-05-25 Thread via GitHub
liamzwbao commented on issue #15969: URL: https://github.com/apache/datafusion/issues/15969#issuecomment-2907974116 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: Upload hprof files on failure [datafusion-comet]

2025-05-25 Thread via GitHub
codecov-commenter commented on PR #1791: URL: https://github.com/apache/datafusion-comet/pull/1791#issuecomment-2907982759 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1791?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relation [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106246132 ## datafusion/sql/src/planner.rs: ## @@ -235,18 +235,27 @@ impl PlannerContext { } // Return a reference to the outer query's schema -pub fn

Re: [PR] migrate `logical_plan` tests to insta [datafusion]

2025-05-25 Thread via GitHub
lifan-ake commented on code in PR #16184: URL: https://github.com/apache/datafusion/pull/16184#discussion_r2106399333 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -4059,120 +4060,118 @@ mod tests { .project(vec![col("id"), exists(plan1).alias("exists")])?

Re: [PR] migrate `logical_plan` tests to insta [datafusion]

2025-05-25 Thread via GitHub
lifan-ake commented on code in PR #16184: URL: https://github.com/apache/datafusion/pull/16184#discussion_r2106399160 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -2269,11 +2270,11 @@ mod tests { .project(vec![col("id")])? .build()?;

Re: [PR] migrate `logical_plan` tests to insta [datafusion]

2025-05-25 Thread via GitHub
lifan-ake commented on code in PR #16184: URL: https://github.com/apache/datafusion/pull/16184#discussion_r2106399499 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -2498,19 +2495,8 @@ mod tests { .project(vec![col("id"), col("first_name").alias("id")]);

Re: [PR] feat: metadata columns [datafusion]

2025-05-25 Thread via GitHub
Curricane commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2908303408 We look forward to supporting this feature as soon as possible, and perhaps enrich the optimization strategy by hiding columns -- This is an automated message from the Apache Git

Re: [PR] refactor: do ambiguous_distinct_check in select [datafusion]

2025-05-25 Thread via GitHub
github-actions[bot] closed pull request #14180: refactor: do ambiguous_distinct_check in select URL: https://github.com/apache/datafusion/pull/14180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106217418 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -287,6 +287,63 @@ pub enum LogicalPlan { Unnest(Unnest), /// A variadic query (e.g. "Recursive CTEs")

Re: [PR] doc: add diagram to describe how DataSource, FileSource, and DataSourceExec are related [datafusion]

2025-05-25 Thread via GitHub
onlyjackfrost commented on PR #16181: URL: https://github.com/apache/datafusion/pull/16181#issuecomment-2907881472 @alamb thanks for the review and comments! I adjusted the diagram, used `cargo doc --open`, and looked. https://github.com/user-attachments/assets/55d2a9f5-b90c-4e6c-9148-

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-25 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2907882330 > I wonder what the plan is for this PR? > > From what I understand, it currently improves performance for aggregates with large numbers of groups, but (slightly) slows down

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907891074 https://github.com/apache/datafusion/pull/16186/files I added this dummy fix to support multiple level outer ref columns, it is enough for us to continue with this story.

Re: [I] Intermittent CI failures [datafusion-comet]

2025-05-25 Thread via GitHub
andygrove commented on issue #1786: URL: https://github.com/apache/datafusion-comet/issues/1786#issuecomment-2907932180 > Here is an example CI Failure: > > * https://github.com/apache/datafusion-comet/actions/runs/15201820724/job/42757252219 > > > [@andygrove](https:/

[PR] [experiment] Run Comet tests in Docker [datafusion-comet]

2025-05-25 Thread via GitHub
andygrove opened a new pull request, #1790: URL: https://github.com/apache/datafusion-comet/pull/1790 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1786 ## Rationale for this change Exploring the idea of runni

Re: [I] Intermittent CI failures [datafusion-comet]

2025-05-25 Thread via GitHub
andygrove commented on issue #1786: URL: https://github.com/apache/datafusion-comet/issues/1786#issuecomment-2908074722 The failure happened at a different point in this run: https://github.com/apache/datafusion-comet/actions/runs/15240457481/job/42860102625?pr=1792 The failur

Re: [PR] Chore: Moved strings expressions to separate file [datafusion-comet]

2025-05-25 Thread via GitHub
andygrove commented on PR #1792: URL: https://github.com/apache/datafusion-comet/pull/1792#issuecomment-2908075868 CI failure is unrelated - https://github.com/apache/datafusion-comet/issues/1786 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106305831 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1089,7 +1089,13 @@ impl OptimizerRule for PushDownFilter { let (volatile_filters, no

Re: [I] Default to collecting statistics when creating LIstingTables [datafusion]

2025-05-25 Thread via GitHub
davisp commented on issue #16158: URL: https://github.com/apache/datafusion/issues/16158#issuecomment-2908080182 Registering my official +1 to default to collecting statistics. For reference, I was working on the TPC-H benchmarks with a scale factor of 20 which generates roughly 20GiB

Re: [PR] fix: Remove trailing whitespace in `Display` for `LogicalPlan::Projection` [datafusion]

2025-05-25 Thread via GitHub
atahanyorganci commented on PR #16164: URL: https://github.com/apache/datafusion/pull/16164#issuecomment-2907862590 thanks for the review @alamb CI passed, I think we can merge it at your discretion. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] [experiment] Run Comet tests in Docker [datafusion-comet]

2025-05-25 Thread via GitHub
andygrove closed pull request #1790: [experiment] Run Comet tests in Docker URL: https://github.com/apache/datafusion-comet/pull/1790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Intermittent CI failures [datafusion-comet]

2025-05-25 Thread via GitHub
andygrove commented on issue #1786: URL: https://github.com/apache/datafusion-comet/issues/1786#issuecomment-2907943915 @alamb more specifically, the failing build that you linked to: ``` CometShuffle4_0Suite: - Fallback to Spark when shuffling on struct with duplicate field nam

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2908203622 We should also fix compile error in `physical_planner.rs(map_logical_node_to_physical)`, need to handle `LogicalPlan::DependentJoin`. -- This is an automated message from the Apach

[PR] feat: Support parsing subqueries with `OuterRefColumn` belongs to non-adjacent outer relation [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai opened a new pull request, #16186: URL: https://github.com/apache/datafusion/pull/16186 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

[PR] chore: Upload hprof files on failure [datafusion-comet]

2025-05-25 Thread via GitHub
andygrove opened a new pull request, #1791: URL: https://github.com/apache/datafusion-comet/pull/1791 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[I] Map functions crash on out of bounds cases [datafusion]

2025-05-25 Thread via GitHub
comphead opened a new issue, #16187: URL: https://github.com/apache/datafusion/issues/16187 ### Describe the bug The query below crashes ``` > select map_values(map([named_struct('a', 1, 'b', null)], [named_struct('a', 1, 'b', null)]))[0] as a; thread 'main' panicked at

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-25 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106461041 ## datafusion/datasource-parquet/tests/apply_schema_adapter_tests.rs: ## @@ -0,0 +1,224 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mor

Re: [I] Support correlated scalar subquery without aggregation [datafusion]

2025-05-25 Thread via GitHub
logan-keede commented on issue #16137: URL: https://github.com/apache/datafusion/issues/16137#issuecomment-2907673352 https://github.com/apache/datafusion/blob/34f250a2b4800845b5c4e61bd928ddbbc4af7ba0/datafusion/expr/src/logical_plan/invariants.rs#L174-L201 DataFusion tries to pre

Re: [PR] fix: Remove trailing whitespace in `Display` for `LogicalPlan::Projection` [datafusion]

2025-05-25 Thread via GitHub
berkaysynnada merged PR #16164: URL: https://github.com/apache/datafusion/pull/16164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-25 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106508912 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +im

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106527079 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -287,6 +287,63 @@ pub enum LogicalPlan { Unnest(Unnest), /// A variadic query (e.g. "Recursive C

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-25 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106508912 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +im

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106138463 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,856 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907709927 Thanks @duongcongtoai , I'll review the other code later, but regarding the depth issue, I don't think it's likely to be handled in the optimizer. I'll organize some questions and We

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106147135 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,856 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907729186 I think a possible implementation approach is to construct the PlannerContext at the outer layer, and then carry the PlannerContext information into the optimizer/physical_planner.

Re: [PR] migrate `logical_plan` tests to insta [datafusion]

2025-05-25 Thread via GitHub
blaginin commented on code in PR #16184: URL: https://github.com/apache/datafusion/pull/16184#discussion_r2106160686 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -2759,10 +2749,24 @@ mod tests { let join = LogicalPlanBuilder::from(left).cross_join(right)?.bui

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-05-25 Thread via GitHub
alamb commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2106160675 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -299,6 +424,17 @@ impl GroupsAccumulatorAdapter { } impl GroupsAccumulator fo

Re: [PR] migrate `logical_plan` tests to insta [datafusion]

2025-05-25 Thread via GitHub
blaginin commented on PR #16184: URL: https://github.com/apache/datafusion/pull/16184#issuecomment-2907753148 Also can you please check the CI? Some tests are failing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
logan-keede commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907757622 > regarding the depth issue, I don't think it's likely to be handled in the optimizer. I agree with @irenjj , it seems like correlated subqueries with depth>1 does not rea

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907724823 The difference between DataFusion and DuckDB in constructing logical plans is: DataFusion directly assigns schema to `LogicalPlan`, while DuckDB saves metadata information in the `Bin

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2907708177 This [PR](https://github.com/apache/datafusion/pull/16016) is ready for review, let me know your opinions I think after this it will unblock us to start implementing so

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106132282 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,856 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106139236 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,856 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-05-25 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2907743279 I am not sure if it is related, but we have also seen some intermittent failures in DataFusion CI - https://github.com/apache/datafusion/issues/16180 -- This is an automated

Re: [I] Intermittent CI failures [datafusion-comet]

2025-05-25 Thread via GitHub
alamb commented on issue #1786: URL: https://github.com/apache/datafusion-comet/issues/1786#issuecomment-2907743068 Here is an example CI Failure: - https://github.com/apache/datafusion-comet/actions/runs/15201820724/job/42757252219 @andygrove can you help me understand what the

[PR] fix: equivalence for union [datafusion]

2025-05-25 Thread via GitHub
chenkovsky opened a new pull request, #16185: URL: https://github.com/apache/datafusion/pull/16185 ## Which issue does this PR close? - Closes #16171. ## Rationale for this change equivalence is not set. ## What changes are included in this PR? compute i

Re: [PR] feat: array_length for fixed size list [datafusion]

2025-05-25 Thread via GitHub
alamb commented on code in PR #16167: URL: https://github.com/apache/datafusion/pull/16167#discussion_r2106168938 ## datafusion/functions-nested/src/length.rs: ## @@ -128,26 +148,20 @@ pub fn array_length_inner(args: &[ArrayRef]) -> Result { match &args[0].data_type() {

Re: [PR] feat: array_length for fixed size list [datafusion]

2025-05-25 Thread via GitHub
alamb commented on PR #16167: URL: https://github.com/apache/datafusion/pull/16167#issuecomment-2907765509 Thanks again @chenkovsky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Arraylength UDF not fully implemented or inconcistent [datafusion]

2025-05-25 Thread via GitHub
alamb closed issue #16163: Arraylength UDF not fully implemented or inconcistent URL: https://github.com/apache/datafusion/issues/16163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] chore: Reduce repetition in the parameter type inference tests [datafusion]

2025-05-25 Thread via GitHub
alamb merged PR #16079: URL: https://github.com/apache/datafusion/pull/16079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Reduce repetition in the parameter type inference tests [datafusion]

2025-05-25 Thread via GitHub
alamb closed issue #16057: Reduce repetition in the parameter type inference tests URL: https://github.com/apache/datafusion/issues/16057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-25 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2907769450 I wonder what the plan is for this PR? From what I understand, it currently improves performance for aggregates with large numbers of groups, but (slightly) slows down aggregates

Re: [PR] fix: equivalence for union [datafusion]

2025-05-25 Thread via GitHub
alamb commented on code in PR #16185: URL: https://github.com/apache/datafusion/pull/16185#discussion_r2106171762 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -422,6 +423,32 @@ impl EquivalenceGroup { self.bridge_classes() } +#[allow(clippy::ty

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-25 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2907770923 I am surprised this shows any performance difference. I will rerun and see if I can reproduce -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-25 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2907771160 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106175049 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,856 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907779111 > It's thrown in planning phase, the error is thrown because in planning phase, planner can only get the schema info from upper query block. Ahah, confirmed, given this q

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-25 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-290856 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-25 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2907785532 🤖: Benchmark completed Details ``` Comparing HEAD and fix_aggregation-seed Benchmark clickbench_extended.json

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
logan-keede commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907793456 > The optimizor is actually invoked, but with the plan of `EmptyRelation` for some reason, we better do something in the planning! How did you confirm that? I tried by p

Re: [PR] doc: add diagram to describe how DataSource, FileSource, and DataSourceExec are related [datafusion]

2025-05-25 Thread via GitHub
alamb commented on PR #16181: URL: https://github.com/apache/datafusion/pull/16181#issuecomment-2907762892 Thank you @onlyjackfrost 🙏 I have a few comments on this diagram here: 1. I think the names in the diagram should match as much as possible the names in the code "file so

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-05-25 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2106174038 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -299,6 +424,17 @@ impl GroupsAccumulatorAdapter { } impl GroupsAccumulator

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
irenjj commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-290338 > Actually this error is thrown after all optimizors are executed, the error is thrown because no existing optimizers are capable of handle nested subqueries. It's thrown in pl

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-25 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-290827 🤖: Benchmark completed Details ``` Comparing HEAD and fix_aggregation-seed Benchmark clickbench_extended.json

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-05-25 Thread via GitHub
alamb commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2907767246 Thansk @clflushopt -- I'll try and check this out tomorrow or Tuesday -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat: array_length for fixed size list [datafusion]

2025-05-25 Thread via GitHub
alamb merged PR #16167: URL: https://github.com/apache/datafusion/pull/16167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907772662 > it seems like correlated subqueries with depth>1 does not reach optimizer as they report Schema Error: No field named xyz.col Actually this error is thrown after all o

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907798826 > How did you confirm that? I tried by putting a debug statement here- my bad, the EmptyRelation is actually invoked for the queries to create table :disappointed:

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-25 Thread via GitHub
duongcongtoai commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907805553 regarding of providing the schema from outer to the deep down subquery, can we do something like this: ``` pub(super) fn parse_scalar_subquery( &self,