Re: [PR] Add support for mysql's drop index (`ALTER TABLE table_a DROP INDEX idx_a`) [datafusion-sqlparser-rs]

2025-06-05 Thread via GitHub
iffyio commented on code in PR #1865: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1865#discussion_r2131627748 ## tests/sqlparser_common.rs: ## @@ -9132,7 +9132,9 @@ fn test_create_index_with_with_clause() { #[test] fn parse_drop_index() { let sql = "DROP

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-05 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2948268910 @zhuqi-lucas @alamb I wanted to work on measuring the performance impact of this PR today, but looking at https://github.com/apache/datafusion/pull/16262#pullrequestreview-290313953

Re: [PR] [branch-48] update changelog [datafusion]

2025-06-05 Thread via GitHub
xudong963 merged PR #16267: URL: https://github.com/apache/datafusion/pull/16267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Extend benchmark comparison script with more detailed statistics [datafusion]

2025-06-05 Thread via GitHub
pepijnve commented on code in PR #16262: URL: https://github.com/apache/datafusion/pull/16262#discussion_r2131600952 ## benchmarks/compare.py: ## @@ -148,10 +174,23 @@ def compare( ) continue -total_baseline_time += baseline_result.execution_t

Re: [PR] Extend benchmark comparison script with more detailed statistics [datafusion]

2025-06-05 Thread via GitHub
pepijnve commented on code in PR #16262: URL: https://github.com/apache/datafusion/pull/16262#discussion_r2131585926 ## benchmarks/bench.sh: ## @@ -66,10 +67,11 @@ DATAFUSION_DIR=/source/datafusion ./bench.sh run tpch ** * Commands ** -data: Generates

[PR] Add compression option to SpillManager [datafusion]

2025-06-05 Thread via GitHub
ding-young opened a new pull request, #16268: URL: https://github.com/apache/datafusion/pull/16268 ## Which issue does this PR close? - Closes #16130 . ## TODO - [ ] add test for compression in spill file - [ ] refine arg names - [ ] check config docs

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-05 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2948082010 https://github.com/apache/datafusion/pull/16267 After it's merged, I'll push the 48.0.0-rc2 and start vote -- This is an automated message from the Apache Git Service. T

[PR] [branch-48] update changelog [datafusion]

2025-06-05 Thread via GitHub
xudong963 opened a new pull request, #16267: URL: https://github.com/apache/datafusion/pull/16267 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-06-05 Thread via GitHub
clflushopt commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2948048431 Hey @alamb following suggestions from @kevinjqliu I am happy to say that https://github.com/clflushopt/datafusion-tpch provides a ux on par with duckdb and what we discussed

Re: [PR] Fix intermittent SQL logic test failure in limit.slt by adding ORDER BY clause [datafusion]

2025-06-05 Thread via GitHub
kosiew commented on code in PR #16257: URL: https://github.com/apache/datafusion/pull/16257#discussion_r2131457267 ## datafusion/sqllogictest/test_files/limit.slt: ## @@ -860,6 +860,7 @@ query I with selection as ( select * from test_limit_with_partitions +order b

[I] Update Fuzz tests to include Dict with null values [datafusion]

2025-06-05 Thread via GitHub
kosiew opened a new issue, #16266: URL: https://github.com/apache/datafusion/issues/16266 #16258 closes #16228 We should [update the fuzz tests to include Dict with null values](https://github.com/apache/datafusion/pull/16258#issuecomment-2944973123). -- This is an automated messa

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-05 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2948015656 > It looks like these changes all went into the `main` branch. I've been testing off of the `48.0.0-rc1` tag, should I switch to test off of `main`? I'll update branch46

Re: [PR] Fix distinct count for DictionaryArray to correctly account for nulls in values array [datafusion]

2025-06-05 Thread via GitHub
kosiew commented on code in PR #16258: URL: https://github.com/apache/datafusion/pull/16258#discussion_r2131413138 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5030,6 +5030,20 @@ select count(distinct column1), count(distinct column2) from dict_test group by sta

Re: [I] Unaligned memory access in `SparkUnsafeRow` [datafusion-comet]

2025-06-05 Thread via GitHub
parthchandra commented on issue #1849: URL: https://github.com/apache/datafusion-comet/issues/1849#issuecomment-2947748623 I wonder how likely is it that a comet user would be on an architecture that does not support unaligned memory access. -- This is an automated message from the Apac

Re: [I] Spark Test fails `vectorized reader: missing all struct fields` [datafusion-comet]

2025-06-05 Thread via GitHub
parthchandra commented on issue #1843: URL: https://github.com/apache/datafusion-comet/issues/1843#issuecomment-2947362362 Why would the expected result be `[null], [null], [null]` ? This means that all the structs are null but that is not the actual data. Only in the third row, is the str

Re: [PR] Extend benchmark comparison script with more detailed statistics [datafusion]

2025-06-05 Thread via GitHub
Copilot commented on code in PR #16262: URL: https://github.com/apache/datafusion/pull/16262#discussion_r2131026859 ## benchmarks/compare.py: ## @@ -148,10 +174,23 @@ def compare( ) continue -total_baseline_time += baseline_result.execution_ti

Re: [PR] Extend benchmark comparison script with more detailed statistics [datafusion]

2025-06-05 Thread via GitHub
2010YOUY01 commented on code in PR #16262: URL: https://github.com/apache/datafusion/pull/16262#discussion_r2131020155 ## benchmarks/bench.sh: ## @@ -66,10 +67,11 @@ DATAFUSION_DIR=/source/datafusion ./bench.sh run tpch ** * Commands ** -data: Generate

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-05 Thread via GitHub
shehabgamin commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2947035398 > I think we have merged all the desired PRs now: > > * [fix: NaN semantics in GROUP BY #16256](https://github.com/apache/datafusion/pull/16256) > > * [

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA [datafusion]

2025-06-05 Thread via GitHub
GitHub user lwwmanning added a comment to the discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA I’d love to give a talk on DataFusion/Vortex stuff! Specifically, how DataFusion’s extensibility was hugely useful for bootstrapping, building/testing, & benchmarking a new file format

Re: [I] Iceberg integration - parquet-column version conflicts [datafusion-comet]

2025-06-05 Thread via GitHub
huaxingao commented on issue #1833: URL: https://github.com/apache/datafusion-comet/issues/1833#issuecomment-2946713655 To work around the shading issues, I'm working on higher-level abstractions so that we don't need to pass any Parquet objects. -- This is an automated message from the

Re: [PR] Fix distinct count for DictionaryArray to correctly account for nulls in values array [datafusion]

2025-06-05 Thread via GitHub
jonathanc-n commented on code in PR #16258: URL: https://github.com/apache/datafusion/pull/16258#discussion_r2130695219 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5030,6 +5030,20 @@ select count(distinct column1), count(distinct column2) from dict_test group by

Re: [PR] Fix distinct count for DictionaryArray to correctly account for nulls in values array [datafusion]

2025-06-05 Thread via GitHub
jonathanc-n commented on code in PR #16258: URL: https://github.com/apache/datafusion/pull/16258#discussion_r2130695219 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5030,6 +5030,20 @@ select count(distinct column1), count(distinct column2) from dict_test group by

Re: [PR] Add ICEBERG keyword support to ALTER TABLE statement [datafusion-sqlparser-rs]

2025-06-05 Thread via GitHub
alamb commented on PR #1869: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1869#issuecomment-2946189427 ICEBERG! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] fix: Remove `COMET_SHUFFLE_FALLBACK_TO_COLUMNAR` hack [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on PR #1736: URL: https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2946200491 The Spark SQL tests for 3.5.5 are now all passing, and there is just the TPC-DS issue left -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] feat(small): Add `BaselineMetrics` to `generate_series()` table function [datafusion]

2025-06-05 Thread via GitHub
alamb commented on code in PR #16255: URL: https://github.com/apache/datafusion/pull/16255#discussion_r2130289823 ## datafusion/physical-plan/src/metrics/baseline.rs: ## @@ -117,9 +117,10 @@ impl BaselineMetrics { } } -/// Process a poll result of a stream pr

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-05 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2946169144 I think we have merged all the desired PRs now: - https://github.com/apache/datafusion/pull/16256 - https://github.com/apache/datafusion/pull/16261 - https://github.com/apa

Re: [PR] Fix intermittent SQL logic test failure in limit.slt by adding ORDER BY clause [datafusion]

2025-06-05 Thread via GitHub
alamb commented on code in PR #16257: URL: https://github.com/apache/datafusion/pull/16257#discussion_r2130296313 ## datafusion/sqllogictest/test_files/limit.slt: ## @@ -860,6 +860,7 @@ query I with selection as ( select * from test_limit_with_partitions +order by

Re: [I] Expose user defined functions in the FFI [datafusion]

2025-06-05 Thread via GitHub
alamb closed issue #14562: Expose user defined functions in the FFI URL: https://github.com/apache/datafusion/issues/14562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] feat: Add Window UDFs to FFI Crate [datafusion]

2025-06-05 Thread via GitHub
alamb merged PR #16261: URL: https://github.com/apache/datafusion/pull/16261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat(small): Add `BaselineMetrics` to `generate_series()` table function [datafusion]

2025-06-05 Thread via GitHub
alamb merged PR #16255: URL: https://github.com/apache/datafusion/pull/16255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA [datafusion]

2025-06-05 Thread via GitHub
GitHub user leoDYL added a comment to the discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA We're looking for a variety of topics with the theme of reflecting on the 50 releases of DataFusion! Seems [VegaFusion](https://vegafusion.io/) has been using DataFusion for a while so it

Re: [PR] feat: Add Window UDFs to FFI Crate [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16261: URL: https://github.com/apache/datafusion/pull/16261#issuecomment-2945974874 If the tests pass I'll merge it in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat: Add Window UDFs to FFI Crate [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16261: URL: https://github.com/apache/datafusion/pull/16261#issuecomment-2945974005 I rebased this PR on main so it was ready to go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #14775: URL: https://github.com/apache/datafusion/pull/14775#issuecomment-2945964478 gogogo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-05 Thread via GitHub
alamb merged PR #14775: URL: https://github.com/apache/datafusion/pull/14775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: fix upgrade papercut `pub use PruningStatistics` [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16264: URL: https://github.com/apache/datafusion/pull/16264#issuecomment-2945947663 WOOHOO 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-05 Thread via GitHub
jonathanc-n commented on PR #16210: URL: https://github.com/apache/datafusion/pull/16210#issuecomment-2945916745 There's an interesting implementation of index joins [here](https://github.com/duckdb/duckdb/pull/1008), but this would involve creating indexes. What are the thoughts on support

Re: [PR] Fix distinct count for DictionaryArray to correctly account for nulls in values array [datafusion]

2025-06-05 Thread via GitHub
alamb commented on code in PR #16258: URL: https://github.com/apache/datafusion/pull/16258#discussion_r2130020509 ## datafusion/functions-aggregate/src/count.rs: ## @@ -711,8 +711,8 @@ impl Accumulator for DistinctCountAccumulator { } (0..arr.len()).try_for_e

Re: [I] Enable more DPP Spark SQL tests [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove closed issue #1739: Enable more DPP Spark SQL tests URL: https://github.com/apache/datafusion-comet/issues/1739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Enable more DPP Spark SQL tests [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on issue #1739: URL: https://github.com/apache/datafusion-comet/issues/1739#issuecomment-2945837570 Fixed in https://github.com/apache/datafusion-comet/issues/1831 and https://github.com/apache/datafusion-comet/pull/1838 -- This is an automated message from the Apache

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #14775: URL: https://github.com/apache/datafusion/pull/14775#issuecomment-2945762124 It seems github is experiencing issues. I will close/reopen this PR to restart the checks https://www.githubstatus.com/ https://github.com/user-attachments/assets/2bc627ef

[PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-05 Thread via GitHub
timsaucer opened a new pull request, #14775: URL: https://github.com/apache/datafusion/pull/14775 ## Which issue does this PR close? This PR addresses part of #14562 ## Rationale for this change This change allows for using user defined **aggregate** functions across FFI

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-05 Thread via GitHub
alamb closed pull request #14775: feat: Add Aggregate UDF to FFI crate URL: https://github.com/apache/datafusion/pull/14775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] chore(deps): bump substrait from 0.56.0 to 0.57.0 [datafusion]

2025-06-05 Thread via GitHub
dependabot[bot] commented on PR #16143: URL: https://github.com/apache/datafusion/pull/16143#issuecomment-2945761029 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA [datafusion]

2025-06-05 Thread via GitHub
GitHub user jonmmease added a comment to the discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA What kinds of talks are you looking for? I may be available to give one on how [VegaFusion](https://vegafusion.io/) uses DataFusion. GitHub link: https://github.com/apache/datafusion/d

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-06-05 Thread via GitHub
adriangb commented on PR #16014: URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2945656786 I've rebased this and it's looking nice now. I think the main open question is the concern about performance / overhead: https://github.com/apache/datafusion/pull/16014/file

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-06-05 Thread via GitHub
adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2129798435 ## datafusion/proto/src/physical_plan/to_proto.rs: ## @@ -506,7 +506,7 @@ pub fn serialize_file_scan_config( .iter() .cloned() .collect

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#issuecomment-2945620138 The Spark version will also need to be updated in `.github/workflows/spark_sql_test_ansi.yml` -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Minor: fix upgrade papercut `pub use PruningStatistics` [datafusion]

2025-06-05 Thread via GitHub
adriangb commented on PR #16264: URL: https://github.com/apache/datafusion/pull/16264#issuecomment-2945617720 Worked!! Sweet. Thank you Andrew (for the fix and nudge). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Minor: fix upgrade papercut `pub use PruningStatistics` [datafusion]

2025-06-05 Thread via GitHub
adriangb merged PR #16264: URL: https://github.com/apache/datafusion/pull/16264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Minor: fix upgrade papercut `pub use PruningStatistics` [datafusion]

2025-06-05 Thread via GitHub
adriangb commented on PR #16264: URL: https://github.com/apache/datafusion/pull/16264#issuecomment-2945616545 Ah I was waiting for the email but in reality I just missed it 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-05 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2945615819 > I would also like to include https://github.com/apache/datafusion/pull/16256 Merged! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add `--substrait-round-trip` option in sqllogictests [datafusion]

2025-06-05 Thread via GitHub
alamb merged PR #16183: URL: https://github.com/apache/datafusion/pull/16183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add `--substrait-round-trip` option in sqllogictests [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16183: URL: https://github.com/apache/datafusion/pull/16183#issuecomment-2945612836 Thank you for the review @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA [datafusion]

2025-06-05 Thread via GitHub
GitHub user timsaucer added a comment to the discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA I am interested in attending and there are a few topics I could present on, depending on what time we have available. GitHub link: https://github.com/apache/datafusion/discussions/1626

Re: [PR] Add `--substrait-round-trip` option in sqllogictests [datafusion]

2025-06-05 Thread via GitHub
alamb commented on code in PR #16183: URL: https://github.com/apache/datafusion/pull/16183#discussion_r2129704959 ## .github/workflows/rust.yml: ## @@ -476,6 +476,28 @@ jobs: POSTGRES_HOST: postgres POSTGRES_PORT: ${{ job.services.postgres.ports[5432] }}

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16195: URL: https://github.com/apache/datafusion/pull/16195#issuecomment-2945581697 Let's wait to merge this PR until we ship DataFusion 48 to limit the breaking changes - #15771 I think we'll be able to merge this in the next few days -- This is an autom

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #14775: URL: https://github.com/apache/datafusion/pull/14775#issuecomment-2945599542 I merged up one more time to make sure the CI tests pass but assuming they do I plan to merge this one in -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-05 Thread via GitHub
jonathanc-n commented on PR #16210: URL: https://github.com/apache/datafusion/pull/16210#issuecomment-2945598547 @2010YOUY01 I was doing some benchmarks on NLJs vs. HJs and it looked bad even for cases where one table is very small which is what NLJs should excel at. One thing is to note th

Re: [PR] fix: Fall back to Spark for `RANGE BETWEEN` window expressions [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove merged PR #1848: URL: https://github.com/apache/datafusion-comet/pull/1848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Minor: fix upgrade papercut `pub use PruningStatistics` [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16264: URL: https://github.com/apache/datafusion/pull/16264#issuecomment-2945558949 https://github.com/user-attachments/assets/ddaa3822-df51-44b0-ad00-6cd2503d894b"; /> 🤔 your github account doesn't seem to be linked to your apache account yet so the checkmark

Re: [PR] fix: Fall back to Spark for `RANGE BETWEEN` window expressions [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on PR #1848: URL: https://github.com/apache/datafusion-comet/pull/1848#issuecomment-2945589324 Thanks for the review @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #14775: URL: https://github.com/apache/datafusion/pull/14775#issuecomment-2945576944 Thank you very much @m09526 for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Minor: fix upgrade papercut `pub use PruningStatistics` [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16264: URL: https://github.com/apache/datafusion/pull/16264#issuecomment-2945549922 Thank you @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2945503093 Thanks @rishvin. I assigned the issue to you. Let me know if you have any questions. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-05 Thread via GitHub
rishvin commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2945492404 @andygrove I can work on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [WIP] Remove `COMET_SHUFFLE_FALLBACK_TO_COLUMNAR` config [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on PR #1736: URL: https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2945396822 Current failures: core1: ``` 2025-06-05T16:40:13.0938574Z [info] - avoid reordering broadcast join keys to match input hash partitioning *** FAILED *** (2 seco

Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]

2025-06-05 Thread via GitHub
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2945391975 thanks @Rachelint and congratulations! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[I] RFC: What 3 level naming system should we use for catalog providers? [datafusion-python]

2025-06-05 Thread via GitHub
timsaucer opened a new issue, #1142: URL: https://github.com/apache/datafusion-python/issues/1142 ## Background Right now in the python interface the catalog provider is mostly not useful. We have a PR incoming that will change that. The issue I see is that in datafusion core reposit

Re: [PR] Move PruningStatistics into datafusion::common [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16069: URL: https://github.com/apache/datafusion/pull/16069#issuecomment-2945338342 This PR caused a minor issue when testing an upgrade. Here is a proposed fix: - https://github.com/apache/datafusion/pull/16264 -- This is an automated message from the Apache Git

[PR] Minor: fix upgrade papercut `pub use PruningStatistics` [datafusion]

2025-06-05 Thread via GitHub
alamb opened a new pull request, #16264: URL: https://github.com/apache/datafusion/pull/16264 ## Which issue does this PR close? - Found while testing https://github.com/delta-io/delta-rs/pull/3520 - Related to #15771 ## Rationale for this change While testing the Dat

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-05 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2945310225 I started testing with delta.rs: https://github.com/delta-io/delta-rs/pull/3520 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[I] AS Keyword added by default even if not specified on original SQL statement [datafusion-sqlparser-rs]

2025-06-05 Thread via GitHub
Luigi6821 opened a new issue, #1875: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1875 Hi all, I am writing you after @TylerBrinks suggested me to open a discussion on the issue mentioned in title. Basically I am using the C# porting version of Rust one (thanks Tyler)

[PR] Improve DataFusion subcrate readme files [datafusion]

2025-06-05 Thread via GitHub
alamb opened a new pull request, #16263: URL: https://github.com/apache/datafusion/pull/16263 ## Which issue does this PR close? - Related to https://github.com/delta-io/delta-rs/pull/3521 ## Rationale for this change While testing a DataFusion upgrade in delta.rs I found

Re: [PR] minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on code in PR #1851: URL: https://github.com/apache/datafusion-comet/pull/1851#discussion_r2129233751 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1394,10 +1394,10 @@ class CometExpressionSuite extends CometTestBase with Adapti

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-05 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2945161909 I will be traveling tomorrow, but myself and @berkaysynnada will help drive this to completion early next week. I made some progress on sketching out a good API and will circle bac

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-05 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2945139509 Changing hats to DataFusion user mode where I need to make sure that the end users of our system can press 'cancel' at any time and that works as expected. From that perspecti

Re: [PR] Handle dicts for distinct count [datafusion]

2025-06-05 Thread via GitHub
blaginin merged PR #15871: URL: https://github.com/apache/datafusion/pull/15871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Improve performance of COUNT (distinct x) for dictionary columns [datafusion]

2025-06-05 Thread via GitHub
blaginin closed issue #258: Improve performance of COUNT (distinct x) for dictionary columns URL: https://github.com/apache/datafusion/issues/258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Adjust slttest to pass without RUST_BACKTRACE enabled [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16251: URL: https://github.com/apache/datafusion/pull/16251#issuecomment-2945038255 > Thank you @alamb , is it possible for --complete also generate substring which matches in CI? Yes, the question is what is the most important substring 🤔 Maybe just the

Re: [PR] Fix distinct count for DictionaryArray to correctly account for nulls in values array [datafusion]

2025-06-05 Thread via GitHub
blaginin commented on code in PR #16258: URL: https://github.com/apache/datafusion/pull/16258#discussion_r2129123740 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5030,6 +5030,20 @@ select count(distinct column1), count(distinct column2) from dict_test group by s

[PR] Extend benchmark comparison script with more detailed statistics [datafusion]

2025-06-05 Thread via GitHub
pepijnve opened a new pull request, #16262: URL: https://github.com/apache/datafusion/pull/16262 ## Which issue does this PR close? - No issue created yet, related to PR #16196. ## Rationale for this change The current benchmark comparison script compares

[I] Update or ignore tests in Spark SQL WholeStageCodegenSuite [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove opened a new issue, #1852: URL: https://github.com/apache/datafusion-comet/issues/1852 ### What is the problem the feature request solves? The following tests in WholeStageCodegenSuite currently pass because we are falling back to Spark, but they fail when they run natively

Re: [I] I would like to be able to use PyDataFrame from other projects [datafusion-python]

2025-06-05 Thread via GitHub
timsaucer commented on issue #581: URL: https://github.com/apache/datafusion-python/issues/581#issuecomment-2944981399 Can you give me a little more context about what you're trying to accomplish? There is some discussion on this page about the difficulties with sharing data between python

Re: [PR] Fix distinct count for DictionaryArray to correctly account for nulls in values array [datafusion]

2025-06-05 Thread via GitHub
blaginin commented on PR #16258: URL: https://github.com/apache/datafusion/pull/16258#issuecomment-2944973123 Thank you @kosiew! Do you mind also changing https://github.com/apache/datafusion/pull/16232/files#diff-08d7a1f4d6a968c393a2a0f2a2f54118f38d6a29009ce31b261f3ca27a2d3396R733 and maki

Re: [PR] Adjust slttest to pass without RUST_BACKTRACE enabled [datafusion]

2025-06-05 Thread via GitHub
zhuqi-lucas commented on PR #16251: URL: https://github.com/apache/datafusion/pull/16251#issuecomment-2944937313 Thank you @alamb , is it possible for --complete also generate substring which matches in CI? -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [datafusion-comet]

2025-06-05 Thread via GitHub
codecov-commenter commented on PR #1851: URL: https://github.com/apache/datafusion-comet/pull/1851#issuecomment-2944850645 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1851?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on PR #1851: URL: https://github.com/apache/datafusion-comet/pull/1851#issuecomment-2944836969 > oh yeah, here we go! lgtm thanks @andygrove Thanks for the review @comphead. This was partly inspired by our conversation yesterday. We typically use `checkSpar

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-05 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2944834501 @ozankabak @pepijnve Interesting, i have added interleave corner testing case **test_infinite_interleave_agg_cancel** now which try to reproduce the corner case, but it works

Re: [PR] minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [datafusion-comet]

2025-06-05 Thread via GitHub
comphead commented on code in PR #1851: URL: https://github.com/apache/datafusion-comet/pull/1851#discussion_r2129022559 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1394,10 +1394,10 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [I] I would like to be able to use PyDataFrame from other projects [datafusion-python]

2025-06-05 Thread via GitHub
andygrove commented on issue #581: URL: https://github.com/apache/datafusion-python/issues/581#issuecomment-2944796949 Perhaps @timsaucer can provide some guidance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] I would like to be able to use PyDataFrame from other projects [datafusion-python]

2025-06-05 Thread via GitHub
mara-schulke commented on issue #581: URL: https://github.com/apache/datafusion-python/issues/581#issuecomment-2944785439 Hi @andygrove, we are currently using `PyDataFrame` and would like to use it to convert back to a `datafusion::DataFrame` do you have any information / guidance on how

Re: [I] Support columns having the same alias [datafusion]

2025-06-05 Thread via GitHub
alamb commented on issue #6543: URL: https://github.com/apache/datafusion/issues/6543#issuecomment-2944658229 > would it be correct to use statement visitors here to add unique aliases? I think that is actually a pretty neat idea -- specifically add the aliases in the SQL planner

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16195: URL: https://github.com/apache/datafusion/pull/16195#issuecomment-2944619110 > > Should it target main or the 47 branch ? > > The `main` branch is the good one (I don't think the branch-47 is the most recent release branch anyway) yeah, let's targe

Re: [PR] minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove commented on code in PR #1851: URL: https://github.com/apache/datafusion-comet/pull/1851#discussion_r2128955528 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -399,12 +399,7 @@ class CometArrayExpressionSuite extends CometTestBase with

[PR] minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [datafusion-comet]

2025-06-05 Thread via GitHub
andygrove opened a new pull request, #1851: URL: https://github.com/apache/datafusion-comet/pull/1851 ## Which issue does this PR close? N/A ## Rationale for this change Improve testing and help prevent regressions ## What changes are included in th

Re: [PR] Adjust slttest to pass without RUST_BACKTRACE enabled [datafusion]

2025-06-05 Thread via GitHub
alamb commented on PR #16251: URL: https://github.com/apache/datafusion/pull/16251#issuecomment-2944591691 > I am confused why CI will not fail for this case, i remember some cases i run locally to -- --complete, but the CI failed, so i add RUST_BACKTRACE to generate. I think sqllogi

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-05 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2944505070 > I feel like we are getting close to a point where we start having not-so-fruitful discussions. I think I have made a good effort to make my arguments and reasoning clear. @

Re: [I] Inconsistency with count distinct on NaN values [datafusion]

2025-06-05 Thread via GitHub
andygrove closed issue #16254: Inconsistency with count distinct on NaN values URL: https://github.com/apache/datafusion/issues/16254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Question on: `visit_expressions_mut` for alias expr [datafusion-sqlparser-rs]

2025-06-05 Thread via GitHub
HuyNguyen7994 commented on issue #1475: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1475#issuecomment-2944295472 @cisaacson I'm having the same problem. Turn out you want to visit `SelectItem`: https://github.com/HuyNguyen7994/datafusion-sqlparser-rs/tree/add-select-item-

  1   2   >