Re: [PR] Minor: Fix links in substrait readme [datafusion]

2025-05-24 Thread via GitHub
alamb merged PR #16156: URL: https://github.com/apache/datafusion/pull/16156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Fix name() for FilterPushdown physical optimizer rule [datafusion]

2025-05-24 Thread via GitHub
alamb merged PR #16175: URL: https://github.com/apache/datafusion/pull/16175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-05-24 Thread via GitHub
blaginin commented on PR #14684: URL: https://github.com/apache/datafusion/pull/14684#issuecomment-2906801096 > @blaginin I see that https://github.com/apache/datafusion/pull/14781 is still draft, this PR stillok otherwise? yes, thanks for the reminder - there was some work happening

Re: [PR] feat: optimize and unparse grouping [datafusion]

2025-05-24 Thread via GitHub
chenkovsky commented on PR #16161: URL: https://github.com/apache/datafusion/pull/16161#issuecomment-2906801006 @eejbyfeldt could you please help me review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
duongcongtoai commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105798200 ## datafusion/optimizer/src/create_dependent_join.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
logan-keede commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105831888 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -351,6 +353,8 @@ fn find_inner_join( join_type: JoinType::Inner, join_constraint:

Re: [PR] build(deps): bump tokio from 1.44.2 to 1.45.0 [datafusion-python]

2025-05-24 Thread via GitHub
dependabot[bot] commented on PR #1125: URL: https://github.com/apache/datafusion-python/pull/1125#issuecomment-2906979041 Superseded by #1134. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] build(deps): bump uuid from 1.16.0 to 1.17.0 [datafusion-python]

2025-05-24 Thread via GitHub
dependabot[bot] opened a new pull request, #1135: URL: https://github.com/apache/datafusion-python/pull/1135 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.16.0 to 1.17.0. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.17.0

[PR] build(deps): bump tokio from 1.44.2 to 1.45.1 [datafusion-python]

2025-05-24 Thread via GitHub
dependabot[bot] opened a new pull request, #1134: URL: https://github.com/apache/datafusion-python/pull/1134 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.44.2 to 1.45.1. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. Tokio

Re: [PR] fix: support `map_values`, `map_keys` [datafusion-comet]

2025-05-24 Thread via GitHub
codecov-commenter commented on PR #1788: URL: https://github.com/apache/datafusion-comet/pull/1788#issuecomment-2906975530 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1788?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] build(deps): bump tokio from 1.44.2 to 1.45.0 [datafusion-python]

2025-05-24 Thread via GitHub
dependabot[bot] closed pull request #1125: build(deps): bump tokio from 1.44.2 to 1.45.0 URL: https://github.com/apache/datafusion-python/pull/1125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[I] Support `map_values` [datafusion-comet]

2025-05-24 Thread via GitHub
comphead opened a new issue, #1789: URL: https://github.com/apache/datafusion-comet/issues/1789 `map_values` will be addressed in following PR. It got a correctness issue ``` - read map[struct, struct] from parquet *** FAILED *** (1 second, 478 milliseconds) Res

Re: [PR] fix: support `map_keys` [datafusion-comet]

2025-05-24 Thread via GitHub
comphead commented on PR #1788: URL: https://github.com/apache/datafusion-comet/pull/1788#issuecomment-2907578088 `map_values` will be addressed in following PR. It got a correctness issue ``` - read map[struct, struct] from parquet *** FAILED *** (1 second, 478 milliseconds)

[PR] migrate `logical_plan` tests to insta [datafusion]

2025-05-24 Thread via GitHub
lifan-ake opened a new pull request, #16184: URL: https://github.com/apache/datafusion/pull/16184 ## Which issue does this PR close? - Closes #15792 . ## Rationale for this change ## What changes are included in this PR? migrate `logical_plan` t

Re: [PR] migrate `logical_plan` tests to insta [datafusion]

2025-05-24 Thread via GitHub
lifan-ake commented on PR #16184: URL: https://github.com/apache/datafusion/pull/16184#issuecomment-2907587385 Hi @alamb and @blaginin , I'am trying to solve this issue #15792 . This PR is ready to review, please take a look once you have time. In my opinion, the doc should be e

[I] Intermittent failures in CI in `test_files/limit.slt` [datafusion]

2025-05-24 Thread via GitHub
alamb opened a new issue, #16180: URL: https://github.com/apache/datafusion/issues/16180 ### Describe the bug The extended tests are failing intermittently on main (mostly pass, but sometimes fail) Here is an example failure: https://github.com/apache/datafusion/actions/

Re: [PR] feat: array_length for fixed size list [datafusion]

2025-05-24 Thread via GitHub
alamb commented on code in PR #16167: URL: https://github.com/apache/datafusion/pull/16167#discussion_r2105792781 ## datafusion/functions-nested/src/length.rs: ## @@ -128,26 +148,20 @@ pub fn array_length_inner(args: &[ArrayRef]) -> Result { match &args[0].data_type() {

Re: [PR] feat: optimize and unparse grouping [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #16161: URL: https://github.com/apache/datafusion/pull/16161#issuecomment-2906786682 Thanks @chenkovsky -- can. you find the original PR that added this `GROUPING` function and perhaps @ mention the author to see if they have any feedback / could help with review? -

Re: [PR] refactor(optimizer): Add support for dynamically adding test tables [datafusion]

2025-05-24 Thread via GitHub
alamb merged PR #16138: URL: https://github.com/apache/datafusion/pull/16138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Postgres: `DROP COLUMN c` reformatted to `DROP c` [datafusion-sqlparser-rs]

2025-05-24 Thread via GitHub
agis opened a new issue, #1859: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1859 Given the following input: ```sql ALTER TABLE logs DROP COLUMN details ``` the following code outputs the statement as: ```sql ALTER TABLE logs DROP details ```

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-24 Thread via GitHub
alamb commented on code in PR #16165: URL: https://github.com/apache/datafusion/pull/16165#discussion_r2105793257 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -57,6 +57,10 @@ mod row_hash; mod topk; mod topk_stream; +/// Hard-coded seed for aggregations to ensure

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906787792 @2010YOUY01 and @ding-young I wonder if you can review this PR again to help @rluvaton get it merged? Specifically if it needs more tests perhaps you can help identify which are

Re: [PR] refactor(optimizer): Add support for dynamically adding test tables [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #16138: URL: https://github.com/apache/datafusion/pull/16138#issuecomment-2906783551 Thanks again @atahanyorganci -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2906808852 🤖: Benchmark completed Details ``` Comparing HEAD and fix_aggregation-seed Benchmark clickbench_extended.json

Re: [PR] chore: Bump arrow to 18.3.0 [datafusion-comet]

2025-05-24 Thread via GitHub
andygrove commented on PR #1773: URL: https://github.com/apache/datafusion-comet/pull/1773#issuecomment-2906822147 > I have rebased this branch onto the latest main. The [ubuntu-latest/java 17-spark-4.0/java](https://github.com/apache/datafusion-comet/actions/runs/15222051464/job/4281914033

Re: [PR] doc: add diagram to describe how DataSource, FileSource, and DataSourceExec are related [datafusion]

2025-05-24 Thread via GitHub
onlyjackfrost commented on PR #16181: URL: https://github.com/apache/datafusion/pull/16181#issuecomment-2906858277 @alamb, could you help review this PR? I have a general understanding of it, but I haven't fully grasped all the details yet. So I only added the diagram and provided

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-24 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906947109 > > @2010YOUY01 and @ding-young I wonder if you can review this PR again to help @rluvaton get it merged? > > > > > > Specifically if it needs more tests perhaps you c

Re: [PR] Fast path for joins with distinct values in build side [datafusion]

2025-05-24 Thread via GitHub
Dandandan merged PR #16153: URL: https://github.com/apache/datafusion/pull/16153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] chore: Specify JVM heap size [datafusion-comet]

2025-05-24 Thread via GitHub
andygrove opened a new pull request, #1787: URL: https://github.com/apache/datafusion-comet/pull/1787 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[PR] Add `--substrait-round-trip` option in sqllogictests [datafusion]

2025-05-24 Thread via GitHub
gabotechs opened a new pull request, #16183: URL: https://github.com/apache/datafusion/pull/16183 ## Which issue does this PR close? It probably does not fully closes it but it partially addresses: - https://github.com/apache/datafusion/issues/15069 ## Rationale for t

Re: [PR] doc: add diagram to describe how DataSource, FileSource, and DataSourceExec are related [datafusion]

2025-05-24 Thread via GitHub
onlyjackfrost commented on PR #16181: URL: https://github.com/apache/datafusion/pull/16181#issuecomment-2907602002 @comphead, I'm not sure what you mean by overlap. The diagram should like this https://github.com/user-attachments/assets/4cff6c5d-892b-47a0-8c36-9bee8ae8cca3"; />

Re: [PR] chore: Bump arrow to 18.3.0 [datafusion-comet]

2025-05-24 Thread via GitHub
andygrove merged PR #1773: URL: https://github.com/apache/datafusion-comet/pull/1773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-24 Thread via GitHub
UBarney commented on code in PR #15954: URL: https://github.com/apache/datafusion/pull/15954#discussion_r2105771450 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -733,13 +733,33 @@ impl AggregateExec { &self.input_order_mode } -fn statistics_inner(

Re: [PR] Fast path for joins with distinct values in build side [datafusion]

2025-05-24 Thread via GitHub
Dandandan commented on PR #16153: URL: https://github.com/apache/datafusion/pull/16153#issuecomment-2906697087 > > This optimization is neat and already covers the common case of joins on primary keys. I think we can further optimize the join hash table - even for cases where _some_ keys mi

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
irenjj commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105849608 ## datafusion/optimizer/src/create_dependent_join.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-05-24 Thread via GitHub
andygrove commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2906874047 Since updating Comet to use latest DataFusion (pinned dependency), we have been seeing regular but intermittent CI failures that we are still trying to debug. It may or

Re: [I] Q23 fails when running TPC-DS SF=1 because of invalid offset buffer being exported for empty StringArray. [datafusion-comet]

2025-05-24 Thread via GitHub
andygrove closed issue #1615: Q23 fails when running TPC-DS SF=1 because of invalid offset buffer being exported for empty StringArray. URL: https://github.com/apache/datafusion-comet/issues/1615 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] chore: Reduce repetition in the parameter type inference tests [datafusion]

2025-05-24 Thread via GitHub
jsai28 commented on PR #16079: URL: https://github.com/apache/datafusion/pull/16079#issuecomment-2907011742 @alamb I think this is ready for review. I converted all of the tests in `params.rs` that included an sql statement, expected types, and param values. -- This is an automated messa

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
duongcongtoai commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105871923 ## datafusion/optimizer/src/create_dependent_join.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

[PR] Update tpch, clickbench, sort_tpch to mark failed queries [datafusion]

2025-05-24 Thread via GitHub
ding-young opened a new pull request, #16182: URL: https://github.com/apache/datafusion/pull/16182 ## Which issue does this PR close? - Closes #16160 . ## TODO - [ ] print the ids of failed queries in tpch, sort_tpch. - [ ] run locally and check the output - [

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
irenjj commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105841172 ## datafusion/optimizer/src/create_dependent_join.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
duongcongtoai commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105839452 ## datafusion/optimizer/src/create_dependent_join.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
duongcongtoai commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105839452 ## datafusion/optimizer/src/create_dependent_join.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
irenjj commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105841172 ## datafusion/optimizer/src/create_dependent_join.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
irenjj commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105845507 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -351,6 +353,8 @@ fn find_inner_join( join_type: JoinType::Inner, join_constraint: JoinC

[I] Intermittent CI failures [datafusion-comet]

2025-05-24 Thread via GitHub
andygrove opened a new issue, #1786: URL: https://github.com/apache/datafusion-comet/issues/1786 ### Describe the bug Since changing the DataFusion dependency to a git dependency on a pinned revision of DataFusion in https://github.com/apache/datafusion-comet/pull/1710 we have been e

Re: [I] Support Custom Function Registration with Catalog and Schema [datafusion]

2025-05-24 Thread via GitHub
leoyvens commented on issue #15363: URL: https://github.com/apache/datafusion/issues/15363#issuecomment-2906881168 A backwards compatibility issue to consider is that `.` is currently valid in udf identifiers. I currently abuse this to emulate schema namespacing for UDFs even though DataFus

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2906788114 Thanks @rluvaton - I will try and find time to review this over the next day or two -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2906787528 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-24 Thread via GitHub
alamb commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2906789714 PLease let me know when you have PRs ready for review or other items that I can try and help with -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] `CollectLeft` / "right deep tree" optimization not triggered for join between 3 or more delta tables [datafusion]

2025-05-24 Thread via GitHub
alamb commented on issue #16106: URL: https://github.com/apache/datafusion/issues/16106#issuecomment-2906792842 Thanks @aditanase -- in general I would classify this under the category of the desire for a more sophisticated join reordering algorithm. I am pretty skeptical that we will be a

Re: [I] commit 304488d3... (2025-02-05) broke JOIN ... USING("UPPERCASE_FIELD_NAME") [datafusion]

2025-05-24 Thread via GitHub
jfahne commented on issue #16120: URL: https://github.com/apache/datafusion/issues/16120#issuecomment-2906960544 Okay so I dug through it and found the error is coming from the following call chain: - The `LogicalPlanBuilder` returned by the `parse_join` calls `join_using` on the bui

Re: [PR] chore: Specify JVM heap size when running scalatest [datafusion-comet]

2025-05-24 Thread via GitHub
andygrove closed pull request #1787: chore: Specify JVM heap size when running scalatest URL: https://github.com/apache/datafusion-comet/pull/1787 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Support u32 indices in HashJoinExec [datafusion]

2025-05-24 Thread via GitHub
jonathanc-n commented on issue #16179: URL: https://github.com/apache/datafusion/issues/16179#issuecomment-2906971884 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Add macro for creating DataFrame (#16090) [datafusion]

2025-05-24 Thread via GitHub
cj-zhukov commented on code in PR #16104: URL: https://github.com/apache/datafusion/pull/16104#discussion_r210602 ## datafusion/common/src/array_conversion.rs: ## @@ -0,0 +1,145 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-24 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906803554 @alamb Sure! I may not be able to provide a detailed review right away, but I can definitely help by running the tests added in the PR locally and looking into memory accounting f

Re: [PR] feat: array_length for fixed size list [datafusion]

2025-05-24 Thread via GitHub
chenkovsky commented on code in PR #16167: URL: https://github.com/apache/datafusion/pull/16167#discussion_r2105801259 ## datafusion/functions-nested/src/length.rs: ## @@ -128,26 +148,20 @@ pub fn array_length_inner(args: &[ArrayRef]) -> Result { match &args[0].data_type()

Re: [PR] feat: optimize and unparse grouping [datafusion]

2025-05-24 Thread via GitHub
chenkovsky commented on PR #16161: URL: https://github.com/apache/datafusion/pull/16161#issuecomment-2906804315 it's related to #12704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-24 Thread via GitHub
2010YOUY01 commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906805485 > @2010YOUY01 and @ding-young I wonder if you can review this PR again to help @rluvaton get it merged? > > Specifically if it needs more tests perhaps you can help identify

Re: [PR] chore: Bump arrow to 18.3.0 [datafusion-comet]

2025-05-24 Thread via GitHub
Kontinuation commented on PR #1773: URL: https://github.com/apache/datafusion-comet/pull/1773#issuecomment-2906791709 I have rebased this branch onto the latest main. The [ubuntu-latest/java 17-spark-4.0/java](https://github.com/apache/datafusion-comet/actions/runs/15222051464/job/428191403

Re: [PR] [Minor] Speedup TPC-H benchmark run with memtable option [datafusion]

2025-05-24 Thread via GitHub
Dandandan merged PR #16159: URL: https://github.com/apache/datafusion/pull/16159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] doc: add diagram to describe how DataSource, FileSource, and DataSourceExec are related [datafusion]

2025-05-24 Thread via GitHub
onlyjackfrost opened a new pull request, #16181: URL: https://github.com/apache/datafusion/pull/16181 ## Which issue does this PR close? - Closes #15887 ## Rationale for this change Add docs for better understanding how DataSource, FileSource, and DataSourceExec are related

Re: [I] union all +aggregate function in the recursive cte results an infinite loop [datafusion-python]

2025-05-24 Thread via GitHub
l1t1 commented on issue #1131: URL: https://github.com/apache/datafusion-python/issues/1131#issuecomment-2906767514 @kosiew thank you, I learned it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Clarify docs and names in parquet predicate pushdown tests [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #16155: URL: https://github.com/apache/datafusion/pull/16155#issuecomment-2906767977 Thank you again for the review @xudong963 and @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Remove `Filter::having` field [datafusion]

2025-05-24 Thread via GitHub
alamb merged PR #16154: URL: https://github.com/apache/datafusion/pull/16154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Clarify docs and names in parquet predicate pushdown tests [datafusion]

2025-05-24 Thread via GitHub
alamb merged PR #16155: URL: https://github.com/apache/datafusion/pull/16155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove `Filter::having` field [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #16154: URL: https://github.com/apache/datafusion/pull/16154#issuecomment-2906767875 Thanks again @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] migrate tests in `pool.rs` to use insta [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #16145: URL: https://github.com/apache/datafusion/pull/16145#issuecomment-2906775628 🚀 let's keep the code moving -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Migrate memory_pool/pool tests to insta [datafusion]

2025-05-24 Thread via GitHub
alamb closed issue #16099: Migrate memory_pool/pool tests to insta URL: https://github.com/apache/datafusion/issues/16099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] migrate tests in `pool.rs` to use insta [datafusion]

2025-05-24 Thread via GitHub
alamb merged PR #16145: URL: https://github.com/apache/datafusion/pull/16145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
irenjj commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105792100 ## datafusion/optimizer/src/create_dependent_join.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-05-24 Thread via GitHub
logan-keede commented on code in PR #16174: URL: https://github.com/apache/datafusion/pull/16174#discussion_r2105889448 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -351,6 +353,8 @@ fn find_inner_join( join_type: JoinType::Inner, join_constraint:

Re: [PR] Add macro for creating DataFrame (#16090) [datafusion]

2025-05-24 Thread via GitHub
comphead commented on code in PR #16104: URL: https://github.com/apache/datafusion/pull/16104#discussion_r2105889739 ## datafusion/core/src/macros.rs: ## @@ -0,0 +1,66 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[PR] fix: support map_values, map_keys [datafusion-comet]

2025-05-24 Thread via GitHub
comphead opened a new pull request, #1788: URL: https://github.com/apache/datafusion-comet/pull/1788 ## Which issue does this PR close? Closes #1781. ## Rationale for this change ## What changes are included in this PR? ## How are these chan

Re: [PR] Fast path for joins with distinct values in build side [datafusion]

2025-05-24 Thread via GitHub
ctsk commented on PR #16153: URL: https://github.com/apache/datafusion/pull/16153#issuecomment-2906567750 This optimization is neat and already covers the common case of joins on primary keys. I think we can further optimize the join hash table - even for cases where *some* keys might have

[I] Tuning Guide for Joins in SQL Queries [datafusion]

2025-05-24 Thread via GitHub
2010YOUY01 opened a new issue, #16176: URL: https://github.com/apache/datafusion/issues/16176 ### Is your feature request related to a problem or challenge? It would be great to add an example under `datafusion-examples` to illustrate the following: 1. Default Planning and Opti

[I] Tuning Guide for Larger-than-memory Queries [datafusion]

2025-05-24 Thread via GitHub
2010YOUY01 opened a new issue, #16177: URL: https://github.com/apache/datafusion/issues/16177 ### Is your feature request related to a problem or challenge? It would be great to include an example (under `datafusion-examples`) to illustrate: 1. Show how to configure the DataFus

[I] [EPIC(doc)]: Tuning guide for DataFusion [datafusion]

2025-05-24 Thread via GitHub
2010YOUY01 opened a new issue, #16178: URL: https://github.com/apache/datafusion/issues/16178 ### Is your feature request related to a problem or challenge? DataFusion currently has around 100 configuration settings: https://datafusion.apache.org/user-guide/configs.html — and the numb

Re: [I] Tuning Guide for Larger-than-memory Queries [datafusion]

2025-05-24 Thread via GitHub
2010YOUY01 commented on issue #16177: URL: https://github.com/apache/datafusion/issues/16177#issuecomment-2906609382 I think it's something we can do to wrap up the GSoC project @ding-young -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Fast path for joins with distinct values in build side [datafusion]

2025-05-24 Thread via GitHub
Dandandan commented on PR #16153: URL: https://github.com/apache/datafusion/pull/16153#issuecomment-2906647265 > This optimization is neat and already covers the common case of joins on primary keys. I think we can further optimize the join hash table - even for cases where _some_ keys migh

[I] Support u32 indices in HashJoinExec [datafusion]

2025-05-24 Thread via GitHub
Dandandan opened a new issue, #16179: URL: https://github.com/apache/datafusion/issues/16179 ### Is your feature request related to a problem or challenge? Currently we save indices to the batch always as `u64` in the `HashTable` and in the `next` `Vec`. If we have less than `u32:M

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-24 Thread via GitHub
UBarney commented on code in PR #15954: URL: https://github.com/apache/datafusion/pull/15954#discussion_r2105771450 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -733,13 +733,33 @@ impl AggregateExec { &self.input_order_mode } -fn statistics_inner(

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-24 Thread via GitHub
UBarney commented on code in PR #15954: URL: https://github.com/apache/datafusion/pull/15954#discussion_r2105771450 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -733,13 +733,33 @@ impl AggregateExec { &self.input_order_mode } -fn statistics_inner(