Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2128032920 ## datafusion/physical-optimizer/src/wrap_leaves_cancellation.rs: ## @@ -0,0 +1,138 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2128017451 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4702,7 +4702,8 @@ physical_plan 01)CrossJoinExec 02)--DataSourceExec: partitions=1, partition_sizes=[

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2128017451 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4702,7 +4702,8 @@ physical_plan 01)CrossJoinExec 02)--DataSourceExec: partitions=1, partition_sizes=[

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2128014148 ## datafusion/physical-plan/src/memory.rs: ## @@ -139,13 +140,18 @@ pub trait LazyBatchGenerator: Send + Sync + fmt::Debug + fmt::Display { /// /// This pla

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2942862579 > > I am onboard with the approach in this PR, and it looks good to me overall. Just needs some finishing touches: > > > > * To avoid such a large diff (which is mostly pla

[PR] fix: NaN semantics in GROUP BY [datafusion]

2025-06-04 Thread via GitHub
chenkovsky opened a new pull request, #16256: URL: https://github.com/apache/datafusion/pull/16256 ## Which issue does this PR close? - Closes #16254. ## Rationale for this change NaN != NaN ## What changes are included in this PR? Use arrow is_eq to test eq

Re: [I] Inconsistency with count distinct on NaN values [datafusion]

2025-06-04 Thread via GitHub
chenkovsky commented on issue #16254: URL: https://github.com/apache/datafusion/issues/16254#issuecomment-2942717364 In spark, it seems that it's hardcoded for NaN https://github.com/apache/spark/blob/4c69d0df96cb379cdf3f63331eae6ebfece008bd/docs/sql-ref-datatypes.md?plain=1#L297 -- This

Re: [PR] Adjust slttest to pass without RUST_BACKTRACE enabled [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 merged PR #16251: URL: https://github.com/apache/datafusion/pull/16251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Inconsistency with count distinct on NaN values [datafusion]

2025-06-04 Thread via GitHub
chenkovsky commented on issue #16254: URL: https://github.com/apache/datafusion/issues/16254#issuecomment-2942703718 after debugging, I found that the root cause is here https://github.com/apache/datafusion/blob/448c985ebbfbb24b0fdba5b9f18a701a6275188a/datafusion/physical-plan/src/aggregates

Re: [I] sqllogictest tests fail when run locally [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 closed issue #16250: sqllogictest tests fail when run locally URL: https://github.com/apache/datafusion/issues/16250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] feat(small): Add `BaselineMetrics` to `generate_series()` table function [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 opened a new pull request, #16255: URL: https://github.com/apache/datafusion/pull/16255 ## Which issue does this PR close? - Closes #. ## Rationale for this change Currently `generate_series()` does not track metrics, see the result in `datafusion-

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-04 Thread via GitHub
huaxingao commented on code in PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2127844183 ## common/src/main/java/org/apache/comet/parquet/TypeUtil.java: ## @@ -74,7 +74,7 @@ public static ColumnDescriptor convertToParquet(StructField field) {

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2942579857 > Oh boy, it's going to take some time to grok how that actually gets evaluated. It surprised me a bit that this gets planned as one linear chain. Is there an explanation somewhe

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2942575262 > > What do you mean built-in yielding? If it means we add some leaf nodes to support built-in yielding support? Thanks! > > Here is rough sketch of what I meant, let's dis

Re: [I] Update tpch, clickbench, sort_tpch to mark failed queries [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 closed issue #16160: Update tpch, clickbench, sort_tpch to mark failed queries URL: https://github.com/apache/datafusion/issues/16160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Update tpch, clickbench, sort_tpch to mark failed queries [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 merged PR #16182: URL: https://github.com/apache/datafusion/pull/16182 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Unaligned memory access in `SparkUnsafeRow` [datafusion-comet]

2025-06-04 Thread via GitHub
Kontinuation closed issue #1850: Unaligned memory access in `SparkUnsafeRow` URL: https://github.com/apache/datafusion-comet/issues/1850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] fix: Fix shuffle writing rows containing null struct fields [datafusion-comet]

2025-06-04 Thread via GitHub
Kontinuation commented on code in PR #1845: URL: https://github.com/apache/datafusion-comet/pull/1845#discussion_r2127825909 ## native/core/src/execution/shuffle/row.rs: ## @@ -3319,6 +3319,7 @@ mod test { } #[test] +#[cfg_attr(miri, ignore)] // Unaligned memory

[I] Unaligned memory access in `SparkUnsafeRow` [datafusion-comet]

2025-06-04 Thread via GitHub
Kontinuation opened a new issue, #1850: URL: https://github.com/apache/datafusion-comet/issues/1850 This was found when working on https://github.com/apache/datafusion-comet/pull/1845. A newly added Rust test that calls into `SparkUnsafeRow.is_null_at(idx)` was identified to have undefined

[I] Unaligned memory access in `SparkUnsafeRow` [datafusion-comet]

2025-06-04 Thread via GitHub
Kontinuation opened a new issue, #1849: URL: https://github.com/apache/datafusion-comet/issues/1849 This was found when working on https://github.com/apache/datafusion-comet/pull/1845. A newly added Rust test that calls into `SparkUnsafeRow.is_null_at(idx)` was identified to have undefined

Re: [PR] fix: Fall back to Spark for some window ranges [datafusion-comet]

2025-06-04 Thread via GitHub
codecov-commenter commented on PR #1848: URL: https://github.com/apache/datafusion-comet/pull/1848#issuecomment-2942392101 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1848?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Schema adapter helper [datafusion]

2025-06-04 Thread via GitHub
kosiew commented on PR #16108: URL: https://github.com/apache/datafusion/pull/16108#issuecomment-2942385599 You're welcome @alamb Thank you for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] Invalid argument error: Invalid arithmetic operation: Int32 - Int64 [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on issue #1246: URL: https://github.com/apache/datafusion-comet/issues/1246#issuecomment-2942272255 Making notes as I learn more about this functionality. We currently only support Int and Long for ranges (Spark supports other numeric and temporal types). We also conver

Re: [PR] fix: Fix shuffle writing rows containing null struct fields [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove merged PR #1845: URL: https://github.com/apache/datafusion-comet/pull/1845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Panic in Comet shuffle when building structs [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove closed issue #1823: Panic in Comet shuffle when building structs URL: https://github.com/apache/datafusion-comet/issues/1823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] CometColumnarExchange throws exception when reading Delta table [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove closed issue #1844: CometColumnarExchange throws exception when reading Delta table URL: https://github.com/apache/datafusion-comet/issues/1844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] fix: Fall back to Spark for some window ranges [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove opened a new pull request, #1848: URL: https://github.com/apache/datafusion-comet/pull/1848 ## Which issue does this PR close? Workaround for https://github.com/apache/datafusion-comet/issues/1246 ## Rationale for this change We do not currently

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-06-04 Thread via GitHub
alamb commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2942192886 Using pyspark to generate expected input/output that gets checked in. sounds like a great idea to me BTW i hope to devote some time next week to helping organize this effort

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-06-04 Thread via GitHub
andygrove commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2942204554 This sounds like a great idea. Thanks @linhr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
shehabgamin commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2942182941 This is a WIP: https://github.com/lakehq/sail/pull/518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] chore: Update documentation and ignore Spark SQL tests for known issue with count distinct on NaN in aggregate [datafusion-comet]

2025-06-04 Thread via GitHub
codecov-commenter commented on PR #1847: URL: https://github.com/apache/datafusion-comet/pull/1847#issuecomment-2941879439 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1847?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] join statement with regex causes panic [datafusion]

2025-06-04 Thread via GitHub
chenkovsky commented on issue #16252: URL: https://github.com/apache/datafusion/issues/16252#issuecomment-2941848939 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Inconsistency with count distinct on NaN values [datafusion]

2025-06-04 Thread via GitHub
chenkovsky commented on issue #16254: URL: https://github.com/apache/datafusion/issues/16254#issuecomment-2941849596 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] chore: Update documentation and ignore Spark SQL tests for known issue [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove opened a new pull request, #1847: URL: https://github.com/apache/datafusion-comet/pull/1847 ## Which issue does this PR close? Part of #1824 ## Rationale for this change Comet is not compatible with Spark for aggregate queries that use the aggr

Re: [PR] chore: Update documentation and ignore Spark SQL tests for known issue [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on code in PR #1847: URL: https://github.com/apache/datafusion-comet/pull/1847#discussion_r2127546868 ## docs/source/user-guide/compatibility.md: ## @@ -29,12 +29,6 @@ Comet aims to provide consistent results with the version of Apache Spark that i This g

Re: [I] Update 4.0.0.diff to reflect recent improvements in 3.5.5.diff [datafusion-comet]

2025-06-04 Thread via GitHub
kazuyukitanimura commented on issue #1846: URL: https://github.com/apache/datafusion-comet/issues/1846#issuecomment-2941770876 Can we also make the same change for 3.4.3 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[I] Inconsistency with count distinct on NaN values [datafusion]

2025-06-04 Thread via GitHub
andygrove opened a new issue, #16254: URL: https://github.com/apache/datafusion/issues/16254 ### Describe the bug I have this csv file: ``` a,b x,NaN x,NaN x,NaN ``` With a simple select query, DF says there is only 1 distinct value for column b (which, I

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2941721735 Oh boy, it's going to take some time to grok how that actually gets evaluated. It surprised me a bit that this gets planned as one linear chain. Is there an explanation somewhere ab

Re: [I] Incorrect results with JVM shuffle: Spark SQL `- SPARK-32038: NormalizeFloatingNumbers should work on distinct aggregate` [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on issue #1824: URL: https://github.com/apache/datafusion-comet/issues/1824#issuecomment-2941721710 The root cause is https://github.com/apache/datafusion/issues/16254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on PR #14775: URL: https://github.com/apache/datafusion/pull/14775#issuecomment-2941554545 @alamb I believe this one is ready. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2941555626 I removed the conflicts, this is in a good state now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[PR] chore(deps-dev): bump webpack-dev-server from 4.15.1 to 5.2.1 in /datafusion/wasmtest/datafusion-wasm-app [datafusion]

2025-06-04 Thread via GitHub
dependabot[bot] opened a new pull request, #16253: URL: https://github.com/apache/datafusion/pull/16253 Bumps [webpack-dev-server](https://github.com/webpack/webpack-dev-server) from 4.15.1 to 5.2.1. Release notes Sourced from https://github.com/webpack/webpack-dev-server/releases"

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2127435736 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -579,302 +611,289 @@ impl EquivalenceProperties { // From the analysis above, w

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2941415843 I think queries of the following form (not this particular one, as it works on a small dataset, but of similar form) could be interesting test vehicles for you: https://git

Re: [PR] Schema adapter helper [datafusion]

2025-06-04 Thread via GitHub
alamb merged PR #16108: URL: https://github.com/apache/datafusion/pull/16108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2941353112 I am sorry I ran out of time today to try and test an upgrade with delta rs -- I will work on it tomorrow morning -- This is an automated message from the Apache Git Service. To

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
paleolimbot commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127379738 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6061,7 +6061,7 @@ physical_plan 04)--AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]

Re: [PR] Schema adapter helper [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16108: URL: https://github.com/apache/datafusion/pull/16108#issuecomment-2941357353 Thanks again @kosiew and @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] WIP: Test DataFusion with experimental IncrementalRecordBatchBuilder [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16208: URL: https://github.com/apache/datafusion/pull/16208#issuecomment-2941355374 Closing in favor of https://github.com/apache/datafusion/pull/16249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Support compound identifier when parsing tuples [datafusion]

2025-06-04 Thread via GitHub
alamb merged PR #16225: URL: https://github.com/apache/datafusion/pull/16225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] WIP: Test DataFusion with experimental IncrementalRecordBatchBuilder [datafusion]

2025-06-04 Thread via GitHub
alamb closed pull request #16208: WIP: Test DataFusion with experimental IncrementalRecordBatchBuilder URL: https://github.com/apache/datafusion/pull/16208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Add support for compound identifiers in tuple parsing [datafusion]

2025-06-04 Thread via GitHub
alamb closed issue #16224: Add support for compound identifiers in tuple parsing URL: https://github.com/apache/datafusion/issues/16224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2941353920 🚀 thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-04 Thread via GitHub
alamb merged PR #16234: URL: https://github.com/apache/datafusion/pull/16234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127366777 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6061,7 +6061,7 @@ physical_plan 04)--AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] 05

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127368484 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6061,7 +6061,7 @@ physical_plan 04)--AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] 05

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-04 Thread via GitHub
alamb commented on code in PR #16249: URL: https://github.com/apache/datafusion/pull/16249#discussion_r2127362835 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -2107,9 +2107,9 @@ RIGHT JOIN (select t2_id from join_t2 where join_t2.t2_id > 11) as join_t2 ON joi

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
paleolimbot commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127342505 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6061,7 +6061,7 @@ physical_plan 04)--AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127359160 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6061,7 +6061,7 @@ physical_plan 04)--AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] 05

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127334822 ## datafusion/expr/src/expr_schema.rs: ## @@ -420,11 +420,18 @@ impl ExprSchemable for Expr { Expr::ScalarVariable(ty, _) => { Ok(A

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
paleolimbot commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127348670 ## datafusion/expr/src/expr.rs: ## @@ -274,16 +275,16 @@ use sqlparser::ast::{ /// assert!(rewritten.transformed); /// // to 42 = 5 AND b = 6 /// assert_eq!

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127334339 ## datafusion/expr/src/expr.rs: ## @@ -274,16 +275,16 @@ use sqlparser::ast::{ /// assert!(rewritten.transformed); /// // to 42 = 5 AND b = 6 /// assert_eq!(r

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127331511 ## datafusion/physical-expr/src/planner.rs: ## @@ -111,14 +112,42 @@ pub fn create_physical_expr( let input_schema: &Schema = &input_dfschema.into();

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2127327722 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6061,7 +6061,7 @@ physical_plan 04)--AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] 05

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #14775: URL: https://github.com/apache/datafusion/pull/14775#discussion_r2127109672 ## datafusion/ffi/src/arrow_wrappers.rs: ## @@ -31,30 +32,37 @@ use log::error; #[derive(Debug, StableAbi)] pub struct WrappedSchema(#[sabi(unsafe_opaque_field

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #14775: URL: https://github.com/apache/datafusion/pull/14775#discussion_r2127291948 ## datafusion/ffi/src/arrow_wrappers.rs: ## @@ -31,30 +32,37 @@ use log::error; #[derive(Debug, StableAbi)] pub struct WrappedSchema(#[sabi(unsafe_opaque_field

Re: [PR] chore: Upgrade to DataFusion 48.0.0-rc1 [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on PR #1842: URL: https://github.com/apache/datafusion-comet/pull/1842#issuecomment-2941181520 Moving to draft since there will be an rc2 soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove merged PR #1838: URL: https://github.com/apache/datafusion-comet/pull/1838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: Fix shuffle writing rows containing null struct fields [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on PR #1845: URL: https://github.com/apache/datafusion-comet/pull/1845#issuecomment-2941046157 I confirmed that this does also close https://github.com/apache/datafusion-comet/issues/1823 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on code in PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2127200413 ## common/src/main/java/org/apache/comet/parquet/TypeUtil.java: ## @@ -74,7 +74,7 @@ public static ColumnDescriptor convertToParquet(StructField field) {

Re: [PR] chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on PR #1838: URL: https://github.com/apache/datafusion-comet/pull/1838#issuecomment-2940981412 We'll need to make the same change in 4.0.0 later https://github.com/apache/datafusion-comet/issues/1846 -- This is an automated message from the Apache Git Service. To resp

[I] Update 4.0.0.diff to reflect recent improvements in 3.5.5.diff [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove opened a new issue, #1846: URL: https://github.com/apache/datafusion-comet/issues/1846 ### What is the problem the feature request solves? Once https://github.com/apache/datafusion-comet/pull/1830 is merged, we should update the 4.0.0 diff file to reflect the recent changes

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#issuecomment-2940951576 I ran into compilation issues locally when building for Spark 3.4. I think these can be resolved simply by moving the new `spark-3.5` shims into the `spark-3.x` shim folder in

Re: [PR] Add ICEBERG keyword support to ALTER TABLE statement [datafusion-sqlparser-rs]

2025-06-04 Thread via GitHub
iffyio merged PR #1869: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on code in PR #14775: URL: https://github.com/apache/datafusion/pull/14775#discussion_r2127109672 ## datafusion/ffi/src/arrow_wrappers.rs: ## @@ -31,30 +32,37 @@ use log::error; #[derive(Debug, StableAbi)] pub struct WrappedSchema(#[sabi(unsafe_opaque_field

Re: [I] Comet Internal Error: Output column count mismatch: expected 0, got 1 [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove closed issue #1251: Comet Internal Error: Output column count mismatch: expected 0, got 1 URL: https://github.com/apache/datafusion-comet/issues/1251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] fix: Handle case where num_cols == 0 in native execution [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove merged PR #1840: URL: https://github.com/apache/datafusion-comet/pull/1840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[I] join statement with regex causes panic [datafusion]

2025-06-04 Thread via GitHub
PPL143 opened a new issue, #16252: URL: https://github.com/apache/datafusion/issues/16252 ### Describe the bug Using regex functions or regex logical operators will result to overflow. ### To Reproduce > CREATE EXTERNAL TABLE bigdata STORED AS CSV LOCATION 'bigdata.csv' O

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940780703 They're linked in the above comment. I'll try to get to them first thing tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2940698205 @ozankabak I patched the gh_compare script to run the benchmark suite locally and it's running now. I'll report back when it completes. Any examples of specific queries I could tr

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2940683168 @pepijnve I would suggest testing with deep plans to see what happens. I am also curious to see the results. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2940673323 Thanks for all the reviews. I went through each and sent a commit that addresses all the suggestions, I also tried to answer any questions I saw. I will later resolve the con

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2127003566 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -845,7 +840,7 @@ pub struct SortExec { /// Fetch highest/lowest n results fetch: Option, ///

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940668740 FYI @xudong963 I think @timsaucer has some PRs he would like to try and get into DataFusion 48 -- I'll let him comment here -- This is an automated message from the Apache Git

Re: [I] sqllogictest tests fail when run locally [datafusion]

2025-06-04 Thread via GitHub
alamb commented on issue #16250: URL: https://github.com/apache/datafusion/issues/16250#issuecomment-2940666489 Here is a proposed fix: - https://github.com/apache/datafusion/pull/16251 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] Adjust slttest to pass without RUST_BACKTRACE enabled [datafusion]

2025-06-04 Thread via GitHub
alamb opened a new pull request, #16251: URL: https://github.com/apache/datafusion/pull/16251 ## Which issue does this PR close? - closes https://github.com/apache/datafusion/issues/16250 ## Rationale for this change Let's have sqllogictest pass locally without having to

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2127000873 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -579,302 +611,289 @@ impl EquivalenceProperties { // From the analysis above, w

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126999507 ## datafusion/catalog/src/listing_schema.rs: ## @@ -143,7 +141,7 @@ impl ListingSchemaProvider { order_exprs: vec![],

Re: [PR] fix: Fix shuffle writing rows containing null struct fields [datafusion-comet]

2025-06-04 Thread via GitHub
codecov-commenter commented on PR #1845: URL: https://github.com/apache/datafusion-comet/pull/1845#issuecomment-2940607579 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1845?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126995806 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -175,307 +135,398 @@ impl ConstExpr { } } +impl PartialEq for ConstExpr { +fn eq(&self, o

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126992798 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -175,307 +135,398 @@ impl ConstExpr { } } +impl PartialEq for ConstExpr { +fn eq(&self, o

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940640461 > We need to pass backtrace env for running new sqllogictest, it can be passed by locally testing: I will try and fix this locally -- This is an automated message from th

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940639917 @mbutrovich FWIW I think that means datafusion requires arrow-flight 55.1 but could run with arrow 55.1 or 55.0 > Are there reasons not to sync on 55.1.0 for a DF 48?

Re: [I] sqllogictest tests fail when run locally [datafusion]

2025-06-04 Thread via GitHub
alamb commented on issue #16250: URL: https://github.com/apache/datafusion/issues/16250#issuecomment-2940637164 I think we should fix the expected result so it doesn't require RUST_BACKTRACE. I'll play around with it -- This is an automated message from the Apache Git Service. To respond

Re: [PR] fix: Fix shuffle writing rows containing null struct fields [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove commented on code in PR #1845: URL: https://github.com/apache/datafusion-comet/pull/1845#discussion_r2126985383 ## native/core/src/execution/shuffle/row.rs: ## @@ -3319,6 +3319,7 @@ mod test { } #[test] +#[cfg_attr(miri, ignore)] // Unaligned memory acc

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2940620817 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_upstream_coalesce Benchmark clickbench_extended.json -

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2940614830 > What do you mean built-in yielding? If it means we add some leaf nodes to support built-in yielding support? Thanks! Here is rough sketch of what I meant, let's discuss and

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2940587625 Just FYI, I've completed (I think at least) the exercise of adapting the operators in https://github.com/pepijnve/datafusion/tree/cancel_safety. I wanted to finish that if only for

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-04 Thread via GitHub
comphead commented on PR #16203: URL: https://github.com/apache/datafusion/pull/16203#issuecomment-2940556210 Thanks @krishvishal let me quickly double check if we can do it with less data massaging. Since this will happen on execution layer it would be called for every batch of data possib

Re: [I] sqllogictest tests fail when run locally [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on issue #16250: URL: https://github.com/apache/datafusion/issues/16250#issuecomment-2940542765 Because the CI testing will automatically setting the RUST_BACKTRACE=1 We need to pass backtrace env for running new sqllogictest, it can be passed by locally testing

  1   2   >