Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-16 Thread via GitHub
UBarney commented on code in PR #15954: URL: https://github.com/apache/datafusion/pull/15954#discussion_r2093873182 ## datafusion/core/tests/physical_optimizer/partition_statistics.rs: ## @@ -488,4 +520,155 @@ mod test { assert_eq!(statistics[0], expected_statistic_part

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-16 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2888023128 > 🤖: Benchmark completed > Details > > ``` > Comparing HEAD and intermeidate-result-blocked-approach > > Benchmark clickbench_extended.json

Re: [PR] Draft: Parse literal to different types [datafusion]

2025-05-16 Thread via GitHub
github-actions[bot] commented on PR #15202: URL: https://github.com/apache/datafusion/pull/15202#issuecomment-2887964733 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Improve SQL syntax error messages in parser (#14437) [datafusion]

2025-05-16 Thread via GitHub
github-actions[bot] closed pull request #14986: Improve SQL syntax error messages in parser (#14437) URL: https://github.com/apache/datafusion/pull/14986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Fix logo in rust API docs [datafusion]

2025-05-16 Thread via GitHub
github-actions[bot] commented on PR #14989: URL: https://github.com/apache/datafusion/pull/14989#issuecomment-2887964831 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] WIP: User defined sorting [datafusion]

2025-05-16 Thread via GitHub
github-actions[bot] commented on PR #15106: URL: https://github.com/apache/datafusion/pull/15106#issuecomment-2887964752 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] datafusion-cli: add streaming state struct [datafusion]

2025-05-16 Thread via GitHub
github-actions[bot] commented on PR #15234: URL: https://github.com/apache/datafusion/pull/15234#issuecomment-2887964712 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Always install correct version of rust in CI [datafusion]

2025-05-16 Thread via GitHub
github-actions[bot] closed pull request #14992: Always install correct version of rust in CI URL: https://github.com/apache/datafusion/pull/14992 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions [datafusion]

2025-05-16 Thread via GitHub
github-actions[bot] commented on PR #15014: URL: https://github.com/apache/datafusion/pull/15014#issuecomment-2887964786 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] [WIP] Attempt piping through field metadata in as many places as possible [datafusion]

2025-05-16 Thread via GitHub
github-actions[bot] closed pull request #15036: [WIP] Attempt piping through field metadata in as many places as possible URL: https://github.com/apache/datafusion/pull/15036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[PR] chore: Remove SMJ experimental status [datafusion]

2025-05-16 Thread via GitHub
comphead opened a new pull request, #16072: URL: https://github.com/apache/datafusion/pull/16072 ## Which issue does this PR close? It took a year or so to polish SMJ and fix bugs - Closes #9846 . ## Rationale for this change ## What changes are included in

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
comphead merged PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] [Experimental scans] schema adapter does not apply required schema for structs within lists [datafusion-comet]

2025-05-16 Thread via GitHub
comphead closed issue #1681: [Experimental scans] schema adapter does not apply required schema for structs within lists URL: https://github.com/apache/datafusion-comet/issues/1681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-16 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2887827231 Ok I verified it works. Can you update the PR to remove all the extra logging and put it up for review? This is amazing work. Thank you! cc @alamb any chance you can take

Re: [I] `describe` does not handle mixed case or dots in column names [datafusion]

2025-05-16 Thread via GitHub
jfahne commented on issue #16017: URL: https://github.com/apache/datafusion/issues/16017#issuecomment-2887737240 I figured it out! The error is coming from here in the call to the `describe(...)` method ([link to line 937 of mod.rs](https://github.com/apache/datafusion/blob/main/datafusion/

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
comphead commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093701858 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
comphead commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093701858 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
comphead commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093701858 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2887720153 🤖: Benchmark completed Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark clickbench_extended.json -

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
comphead commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093701858 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] Per file filter evaluation [datafusion]

2025-05-16 Thread via GitHub
adriangb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2093694833 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -55,8 +57,8 @@ pub(super) struct ParquetOpener { pub limit: Option, /// Optional predicate to appl

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093691011 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093690008 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093686474 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093686930 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#discussion_r2093684993 ## native/core/src/parquet/parquet_support.rs: ## @@ -171,7 +178,36 @@ fn cast_array( .with_timezone(Arc::clone(tz)), ))

Re: [PR] fix: get_struct field is incorrect when struct in array [datafusion-comet]

2025-05-16 Thread via GitHub
comphead commented on PR #1687: URL: https://github.com/apache/datafusion-comet/pull/1687#issuecomment-2887686100 @andygrove @parthchandra @mbutrovich if you have time to look into it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] Write documentation for working with Comet's spark-4.0 profile in IntelliJ [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove opened a new issue, #1745: URL: https://github.com/apache/datafusion-comet/issues/1745 ### What is the problem the feature request solves? When switching from the default profile to the spark-4.0 profile (and the jdk-17) profile, I ran into various issues with building and r

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2887661391 🤖: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark clickbench_extended.json --

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2887661504 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

[PR] feat: expose cluster state notifications [datafusion-ballista]

2025-05-16 Thread via GitHub
milenkovicm opened a new pull request, #1263: URL: https://github.com/apache/datafusion-ballista/pull/1263 # Which issue does this PR close? Closes #. # Rationale for this change there are cases when external observers need to react on cluster state changes. # W

[PR] fix new rust 1.87 cargo clippy warnings [datafusion-sqlparser-rs]

2025-05-16 Thread via GitHub
lovasoa opened a new pull request, #1856: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1856 all PRs are currently red because of the new rust release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Fix: common_sub_expression_eliminate optimizer rule failed [datafusion]

2025-05-16 Thread via GitHub
Col-Waltz commented on code in PR #16066: URL: https://github.com/apache/datafusion/pull/16066#discussion_r2093603729 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -316,6 +316,19 @@ impl CommonSubexprEliminate { } => {

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2887612346 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2887612251 🤖: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark sort_tpch.json ┏━━━

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #14366: URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2887593105 🤖: Benchmark completed Details ``` group alamb_make_expr_smaller main -

Re: [PR] Fix: common_sub_expression_eliminate optimizer rule failed [datafusion]

2025-05-16 Thread via GitHub
Col-Waltz commented on code in PR #16066: URL: https://github.com/apache/datafusion/pull/16066#discussion_r2093603729 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -316,6 +316,19 @@ impl CommonSubexprEliminate { } => {

[PR] pretty-print CREATE VIEW statements [datafusion-sqlparser-rs]

2025-05-16 Thread via GitHub
lovasoa opened a new pull request, #1855: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1855 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Fix: common_sub_expression_eliminate optimizer rule failed [datafusion]

2025-05-16 Thread via GitHub
Col-Waltz commented on code in PR #16066: URL: https://github.com/apache/datafusion/pull/16066#discussion_r2093603729 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -316,6 +316,19 @@ impl CommonSubexprEliminate { } => {

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2887593204 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2887592362 Thank you so much for this PR btw @Rachelint -- it is really really nice. I can't wait to see how the performance looks -- This is an automated message from the Apache Git Service.

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-16 Thread via GitHub
alamb commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2093618262 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -339,6 +344,35 @@ impl SkipAggregationProbe { /// │ 2 │ 2 │ 3.0 ││ 2 │ 2 │ 3.0 │

[PR] pretty-print CREATE TABLE statements [datafusion-sqlparser-rs]

2025-05-16 Thread via GitHub
lovasoa opened a new pull request, #1854: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1854 see https://github.com/apache/datafusion-sqlparser-rs/issues/1850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15994: URL: https://github.com/apache/datafusion/pull/15994#issuecomment-2887574752 Makes sense to me 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Add Extended Clickbench benchmark for avg(duration) [datafusion]

2025-05-16 Thread via GitHub
alamb commented on issue #15949: URL: https://github.com/apache/datafusion/issues/15949#issuecomment-2887567057 @logan-keede perhaps this is a ticket you have interest in helping finish (and then we can get https://github.com/apache/datafusion/pull/15748 over the line) -- This is an auto

Re: [PR] Fix: common_sub_expression_eliminate optimizer rule failed [datafusion]

2025-05-16 Thread via GitHub
Col-Waltz commented on code in PR #16066: URL: https://github.com/apache/datafusion/pull/16066#discussion_r2093603729 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -316,6 +316,19 @@ impl CommonSubexprEliminate { } => {

Re: [I] Remove Deprecated `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` early (before deprecation deadline) [datafusion]

2025-05-16 Thread via GitHub
alamb commented on issue #15950: URL: https://github.com/apache/datafusion/issues/15950#issuecomment-2887555249 I agree -- thanks @logan-keede -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-16 Thread via GitHub
andygrove commented on PR #15994: URL: https://github.com/apache/datafusion/pull/15994#issuecomment-2887549152 > Thanks @andygrove and @shehabgamin -- how close do you think we are to being able to open up the `datafusion-spark` floodgates to contributors? Ideally, I would like to wai

Re: [I] Remove Deprecated `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` early (before deprecation deadline) [datafusion]

2025-05-16 Thread via GitHub
logan-keede commented on issue #15950: URL: https://github.com/apache/datafusion/issues/15950#issuecomment-2887546453 I think this issue can be closed with #16034, #16007. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15994: URL: https://github.com/apache/datafusion/pull/15994#issuecomment-2887545046 Thanks @andygrove and @shehabgamin -- how close do you think we are to being able to open up the `datafusion-spark` floodgates to contributors? -- This is an automated message from t

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-16 Thread via GitHub
alamb merged PR #15994: URL: https://github.com/apache/datafusion/pull/15994 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: add macros for DataFusionError variants [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15946: URL: https://github.com/apache/datafusion/pull/15946#issuecomment-2887542551 I am a little worried about changing the DataFusionError variant as it which will be a disruptive downstream change (users have to update all match statements that currently match on `

Re: [PR] feat: add macros for DataFusionError variants [datafusion]

2025-05-16 Thread via GitHub
alamb commented on code in PR #15946: URL: https://github.com/apache/datafusion/pull/15946#discussion_r2093584071 ## datafusion-cli/src/print_options.rs: ## @@ -143,9 +143,7 @@ impl PrintOptions { format_options: &FormatOptions, ) -> Result<()> { if self.f

[PR] adding support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-16 Thread via GitHub
logan-keede opened a new pull request, #16071: URL: https://github.com/apache/datafusion/pull/16071 ## Which issue does this PR close? - Closes #16032 . ## Rationale for this change Using #16025 as reference, adding support and relevant tests for `LargeList` and `Fix

[PR] Update a bunch of dependencies [datafusion]

2025-05-16 Thread via GitHub
alamb opened a new pull request, #16070: URL: https://github.com/apache/datafusion/pull/16070 ## Which issue does this PR close? - Closes #. ## Rationale for this change Let's try and keep our dependencies up to date ## What changes are included in this PR?

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-16 Thread via GitHub
adriangb commented on PR #16014: URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2887515831 Let's start with https://github.com/apache/datafusion/pull/16069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Move PruningStatistics into datafusion::common [datafusion]

2025-05-16 Thread via GitHub
adriangb opened a new pull request, #16069: URL: https://github.com/apache/datafusion/pull/16069 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Fix: common_sub_expression_eliminate optimizer rule failed [datafusion]

2025-05-16 Thread via GitHub
alamb commented on code in PR #16066: URL: https://github.com/apache/datafusion/pull/16066#discussion_r2093568326 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -316,6 +316,19 @@ impl CommonSubexprEliminate { } => { l

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #14366: URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2887495892 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~

Re: [PR] chore: More refactoring of type checking logic [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1744: URL: https://github.com/apache/datafusion-comet/pull/1744#discussion_r2093551804 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2758,16 +2758,14 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-16 Thread via GitHub
alamb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2093518770 ## datafusion/common/src/pruning.rs: ## @@ -0,0 +1,490 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[PR] fix: fix clippy issue after rust update to 1.87 [datafusion-ballista]

2025-05-16 Thread via GitHub
milenkovicm opened a new pull request, #1262: URL: https://github.com/apache/datafusion-ballista/pull/1262 # Which issue does this PR close? Closes #. # Rationale for this change rust `1.87` clippy has chabged `large-error-threshold` clippy option making noise in ba

Re: [I] Update workspace / CI to Rust 1.87 [datafusion]

2025-05-16 Thread via GitHub
alamb commented on issue #16061: URL: https://github.com/apache/datafusion/issues/16061#issuecomment-2887394990 Thanks @kadai0308 -- I left some comments on the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2887367750 > @alamb @Dandandan Hi, this pr is ready again now. I will do so as soon as possible (hopefully later today but probably more like Sunday morning as I am away Saturday / tomorrow

Re: [PR] chore: More refactoring of type checking logic [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on PR #1744: URL: https://github.com/apache/datafusion-comet/pull/1744#issuecomment-2887350208 I have a Spark SQL test failure to resolve: ``` SPARK-47430 Support GROUP BY MapType *** FAILED *** (124 milliseconds) org.apache.spark.SparkException: Job aborted

Re: [PR] fix: coerce int96 resolution inside of list, struct, and map types [datafusion]

2025-05-16 Thread via GitHub
andygrove commented on PR #16058: URL: https://github.com/apache/datafusion/pull/16058#issuecomment-2887345559 > It is working that way at the moment, but the type transformation just isn't digging into nested types to find nested int96 fields. My understanding is that PR is extending

Re: [PR] chore: More refactoring of type checking logic [datafusion-comet]

2025-05-16 Thread via GitHub
codecov-commenter commented on PR #1744: URL: https://github.com/apache/datafusion-comet/pull/1744#issuecomment-2887342403 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1744?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] add unit tests for expression functions [datafusion-python]

2025-05-16 Thread via GitHub
timsaucer commented on PR #1121: URL: https://github.com/apache/datafusion-python/pull/1121#issuecomment-2887291492 Thank you all! Great suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: add user defined table function support [datafusion-python]

2025-05-16 Thread via GitHub
timsaucer commented on code in PR #1113: URL: https://github.com/apache/datafusion-python/pull/1113#discussion_r2093421692 ## python/datafusion/udf.py: ## @@ -760,8 +760,74 @@ def wrapper(*args: Any, **kwargs: Any) -> Expr: return decorator +class TableFunction: +

Re: [I] Generate GroupByHash output in multiple `RecordBatch`es rather than one large one [datafusion]

2025-05-16 Thread via GitHub
alamb commented on issue #9562: URL: https://github.com/apache/datafusion/issues/9562#issuecomment-2887254045 > Well the hacky solution is just `EmitTo::First(batch_size)` until done 😄 I guess the concern is that it's not efficient because it's using `Vec::split_off` under the hood in most

Re: [I] Generate GroupByHash output in multiple `RecordBatch`es rather than one large one [datafusion]

2025-05-16 Thread via GitHub
joroKr21 commented on issue #9562: URL: https://github.com/apache/datafusion/issues/9562#issuecomment-2887243716 Well the hacky solution is just `EmitTo::First(batch_size)` until done 😄 I guess the concern is that it's not efficient because it's using `Vec::split_off` under the hood in m

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2887235113 > I am confused why my benchmark for local Mac no regression for sort-tpch Q3, but the generated benchmark for linux we can reproduce the regression. It may be that the tests are

Re: [I] Generate GroupByHash output in multiple `RecordBatch`es rather than one large one [datafusion]

2025-05-16 Thread via GitHub
alamb commented on issue #9562: URL: https://github.com/apache/datafusion/issues/9562#issuecomment-2887224796 > @alamb why are you talking about accumulator APIs? ExecutionState::ProducingOutput is purely local to GroupedHashAggregateStream. I already tried implementing this change and it w

Re: [I] Generate GroupByHash output in multiple `RecordBatch`es rather than one large one [datafusion]

2025-05-16 Thread via GitHub
joroKr21 commented on issue #9562: URL: https://github.com/apache/datafusion/issues/9562#issuecomment-2887207261 I reached the same conclusion. Had a hash join after aggregation and it wanted to allocated > 16GB memory for couple hundred MB of aggregation state 😢 > I think this appro

Re: [PR] chore: More refactoring of type checking logic [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1744: URL: https://github.com/apache/datafusion-comet/pull/1744#discussion_r2093365991 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2758,16 +2758,14 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] fix: coerce int96 resolution inside of list, struct, and map types [datafusion]

2025-05-16 Thread via GitHub
mbutrovich commented on code in PR #16058: URL: https://github.com/apache/datafusion/pull/16058#discussion_r2093363005 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1576,3 +1715,164 @@ fn create_max_min_accs( .collect(); (max_values, min_values) } +

Re: [PR] chore: More refactoring of type checking logic [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1744: URL: https://github.com/apache/datafusion-comet/pull/1744#discussion_r2093358984 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -242,15 +240,6 @@ object CometSparkSessionExtensions extends Logging {

Re: [PR] fix: coerce int96 resolution inside of list, struct, and map types [datafusion]

2025-05-16 Thread via GitHub
comphead commented on code in PR #16058: URL: https://github.com/apache/datafusion/pull/16058#discussion_r2093350094 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1576,3 +1715,164 @@ fn create_max_min_accs( .collect(); (max_values, min_values) } + +#

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-16 Thread via GitHub
huaxingao commented on PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#issuecomment-2887151838 We actually don't need this PR any more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-16 Thread via GitHub
huaxingao closed pull request #1723: fix: Support Schema Evolution in iceberg URL: https://github.com/apache/datafusion-comet/pull/1723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Use qualified names on DELETE selections [datafusion]

2025-05-16 Thread via GitHub
comphead commented on PR #16033: URL: https://github.com/apache/datafusion/pull/16033#issuecomment-2887140210 Agree, I was mostly thinking about in memory dataset or Iceberg as it also supports DML, lets wait for the real use case to come -- This is an automated message from the Apache Gi

Re: [I] `describe` does not handle mixed case or dots in column names [datafusion]

2025-05-16 Thread via GitHub
johnkerl commented on issue #16017: URL: https://github.com/apache/datafusion/issues/16017#issuecomment-2887125561 Thanks @jfahne ! :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Add imdb 10 rows slt test [datafusion]

2025-05-16 Thread via GitHub
kumarlokesh commented on issue #15934: URL: https://github.com/apache/datafusion/issues/15934#issuecomment-2887121888 @jayzhan211 made an attempt to address this requirement here: https://github.com/apache/datafusion/pull/16067. Please have a look. -- This is an automated message from the

Re: [I] Update workspace / CI to Rust 1.87 [datafusion]

2025-05-16 Thread via GitHub
kadai0308 commented on issue #16061: URL: https://github.com/apache/datafusion/issues/16061#issuecomment-2887120800 After updating to Rust 1.87, I encountered several warnings such as: ``` warning: large size difference between variants --> datafusion/optimizer/src/simplify_expres

[PR] chore(CI) Upgrade toolchain to Rust-1.87 [datafusion]

2025-05-16 Thread via GitHub
kadai0308 opened a new pull request, #16068: URL: https://github.com/apache/datafusion/pull/16068 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16061#issuecomment-2886309321 ## Rationale for this change Improve build times.

Re: [PR] add unit tests for expression functions [datafusion-python]

2025-05-16 Thread via GitHub
kevinjqliu commented on code in PR #1121: URL: https://github.com/apache/datafusion-python/pull/1121#discussion_r2093257339 ## python/tests/test_expr.py: ## @@ -275,3 +277,465 @@ def test_col_getattr(): def test_alias_with_metadata(df): df = df.select(col("a").alias("b",

Re: [PR] Use qualified names on DELETE selections [datafusion]

2025-05-16 Thread via GitHub
alamb commented on PR #16033: URL: https://github.com/apache/datafusion/pull/16033#issuecomment-2887100083 > Btw @alamb this is prob a good question is there a place in roadmap to support UPDATE/DELETE DDLs? for a single machine engine it might be beneficial. DuckDB supports it TLDR

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-16 Thread via GitHub
comphead commented on PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#issuecomment-2887069661 are we good to merge this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] Added SLT tests for IMDB benchmark queries [datafusion]

2025-05-16 Thread via GitHub
kumarlokesh opened a new pull request, #16067: URL: https://github.com/apache/datafusion/pull/16067 ## Which issue does this PR close? - Closes #15934. ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [PR] chore: More refactoring of type checking logic [datafusion-comet]

2025-05-16 Thread via GitHub
andygrove commented on code in PR #1744: URL: https://github.com/apache/datafusion-comet/pull/1744#discussion_r2093234206 ## spark/src/main/scala/org/apache/spark/sql/comet/CometNativeScanExec.scala: ## @@ -237,8 +235,9 @@ object CometNativeScanExec extends DataTypeSupport {

[PR] Fix: common_sub_expression_eliminate optimizer rule failed [datafusion]

2025-05-16 Thread via GitHub
Col-Waltz opened a new pull request, #16066: URL: https://github.com/apache/datafusion/pull/16066 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/15291. ## Rationale for this change Common_sub_expression_eliminate rule failed with e

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-16 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2886952492 I am confused why my benchmark for local Mac no regression for sort-tpch Q3, but the generated benchmark for linux we can reproduce the regression. -- This is an automated mes

Re: [PR] chore: More refactoring of type checking logic [datafusion-comet]

2025-05-16 Thread via GitHub
Copilot commented on code in PR #1744: URL: https://github.com/apache/datafusion-comet/pull/1744#discussion_r2093225482 ## spark/src/main/scala/org/apache/spark/sql/comet/CometNativeScanExec.scala: ## @@ -237,8 +235,9 @@ object CometNativeScanExec extends DataTypeSupport {

Re: [I] `describe` does not handle mixed case or dots in column names [datafusion]

2025-05-16 Thread via GitHub
jfahne commented on issue #16017: URL: https://github.com/apache/datafusion/issues/16017#issuecomment-2886965734 Just an update. I was able to reproduce the error with the following goofy test added to the [dataframe tests](https://github.com/apache/datafusion/blob/main/datafusion/core/test

Re: [I] Update workspace / CI to Rust 1.87 [datafusion]

2025-05-16 Thread via GitHub
kadai0308 commented on issue #16061: URL: https://github.com/apache/datafusion/issues/16061#issuecomment-2886064977 Hi @alamb, I'm new here and excited to contribute to the project. Could you please assign the ticket to me? Looking forward to getting started! -- This is an automat

Re: [I] Update workspace / CI to Rust 1.87 [datafusion]

2025-05-16 Thread via GitHub
kadai0308 commented on issue #16061: URL: https://github.com/apache/datafusion/issues/16061#issuecomment-2886085084 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Update workspace / CI to Rust 1.87 [datafusion]

2025-05-16 Thread via GitHub
DanielFrantes commented on issue #16061: URL: https://github.com/apache/datafusion/issues/16061#issuecomment-2886076214 @kadai0308 hi just wrote a comment with only the word "take" here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add support for INCLUDE/EXCLUDE NULLS for UNPIVOT [datafusion-sqlparser-rs]

2025-05-16 Thread via GitHub
iffyio commented on code in PR #1849: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1849#discussion_r208830 ## src/ast/query.rs: ## @@ -1257,6 +1257,7 @@ pub enum TableFactor { value: Ident, name: Ident, columns: Vec, +includ

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-16 Thread via GitHub
iffyio commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2879231551 > I think it'd be useful read a whole file into a string, then pass it to this library for parsing, then process the AST's Not sure I necessarily agree with this sen

Re: [PR] pretty print improvements [datafusion-sqlparser-rs]

2025-05-16 Thread via GitHub
iffyio commented on code in PR #1851: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1851#discussion_r2088311461 ## src/display_utils.rs: ## @@ -71,12 +68,17 @@ impl Display for SpaceOrNewline { /// A value that displays a comma-separated list of values. /// Wh

  1   2   >