Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-02 Thread via GitHub
suibianwanwank commented on code in PR #16234: URL: https://github.com/apache/datafusion/pull/16234#discussion_r2122871124 ## datafusion/physical-expr/src/window/aggregate.rs: ## @@ -85,6 +88,18 @@ impl PlainAggregateWindowExpr { ); } } + +fn is_wi

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-02 Thread via GitHub
suibianwanwank commented on code in PR #16234: URL: https://github.com/apache/datafusion/pull/16234#discussion_r2122842898 ## datafusion/physical-expr/src/window/aggregate.rs: ## @@ -85,6 +88,18 @@ impl PlainAggregateWindowExpr { ); } } + +fn is_wi

Re: [PR] Update tpch, clickbench, sort_tpch to mark failed queries [datafusion]

2025-06-02 Thread via GitHub
ding-young commented on code in PR #16182: URL: https://github.com/apache/datafusion/pull/16182#discussion_r2122839920 ## benchmarks/src/util/run.rs: ## @@ -138,6 +144,28 @@ impl BenchmarkRun { } } +/// Print the names of failed queries, if any +pub fn ma

Re: [PR] feat: Translate Hadoop S3A configurations to object_store configurations [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove merged PR #1817: URL: https://github.com/apache/datafusion-comet/pull/1817 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Support metadata columns (`location`, `size`, `last_modified`) in `ListingTableProvider` [datafusion]

2025-06-02 Thread via GitHub
phillipleblanc closed issue #15173: Support metadata columns (`location`, `size`, `last_modified`) in `ListingTableProvider` URL: https://github.com/apache/datafusion/issues/15173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2122154995 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -125,40 +120,85 @@ use itertools::Itertools; /// # let col_c = col("c", &schema).unwrap(); //

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2122154995 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -125,40 +120,85 @@ use itertools::Itertools; /// # let col_c = col("c", &schema).unwrap(); //

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2933589884 Besides the solution need to be updated. The remaining testing/ example fail, i am trying to debugging. -- This is an automated message from the Apache Git Service. To respond

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-02 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2933415726 I plan to update version and changelog today and then test with MV tomorrow. I hope we can start the vote process on Friday, and release early next week. -- Th

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-02 Thread via GitHub
krishvishal commented on PR #16203: URL: https://github.com/apache/datafusion/pull/16203#issuecomment-2933308216 @comphead I've changed the implementation a bit to handle nulls properly. Previous just outputs `NULL` for queries like `select [named_struct('a', 1, 'b', null)][1];` instead of

Re: [I] TPC-H queries used in DataFusion are missing `limit` clause [datafusion]

2025-06-02 Thread via GitHub
xudong963 closed issue #16229: TPC-H queries used in DataFusion are missing `limit` clause URL: https://github.com/apache/datafusion/issues/16229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add example demonstrating how Parquet encryption could be configured with KMS integration [datafusion]

2025-06-02 Thread via GitHub
adamreeve commented on PR #16237: URL: https://github.com/apache/datafusion/pull/16237#issuecomment-2933377756 One thing this example doesn't cover is how users might directly set encryption keys without using a KMS. That could be achieved using the same API and implementing an `EncryptionF

[PR] Add example demonstrating how Parquet encryption could be configured with KMS integration [datafusion]

2025-06-02 Thread via GitHub
adamreeve opened a new pull request, #16237: URL: https://github.com/apache/datafusion/pull/16237 This is a draft PR with a non-working example demonstrating how reading and writing of Parquet files using modular encryption could be supported in DataFusion. Related to #15216 --

[PR] Minor: Print cargo command in bench script [datafusion]

2025-06-02 Thread via GitHub
2010YOUY01 opened a new pull request, #16236: URL: https://github.com/apache/datafusion/pull/16236 ## Which issue does this PR close? - Closes #. ## Rationale for this change Before, the debug `cargo` command would only be printed when running `tpch` benchmar

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-02 Thread via GitHub
2010YOUY01 commented on code in PR #16234: URL: https://github.com/apache/datafusion/pull/16234#discussion_r2122568995 ## datafusion/physical-expr/src/window/aggregate.rs: ## @@ -85,6 +88,18 @@ impl PlainAggregateWindowExpr { ); } } + +fn is_window

Re: [I] Support reading data from S3 using native_datafusion Parquet scanner [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove closed issue #1766: Support reading data from S3 using native_datafusion Parquet scanner URL: https://github.com/apache/datafusion-comet/issues/1766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add DataFrame API Documentation for DataFusion Python [datafusion-python]

2025-06-02 Thread via GitHub
kosiew commented on code in PR #1132: URL: https://github.com/apache/datafusion-python/pull/1132#discussion_r2122483146 ## docs/source/api/dataframe.rst: ## @@ -0,0 +1,374 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreemen

Re: [I] `sort_fuzz` testing DX improvements [datafusion]

2025-06-02 Thread via GitHub
jonathanc-n commented on issue #16233: URL: https://github.com/apache/datafusion/issues/16233#issuecomment-2933126311 I agree with this, I've encountered some of these, especially this: > Streamline seeds. There are some hardcoded seeds, but it's unclear how to call a function with argum

Re: [I] `sort_fuzz` testing DX improvements [datafusion]

2025-06-02 Thread via GitHub
jonathanc-n commented on issue #16233: URL: https://github.com/apache/datafusion/issues/16233#issuecomment-2933128589 @2010YOUY01 I think you'd be interested -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-02 Thread via GitHub
jonathanc-n commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2933072660 I ran some of my own benchmarks and it looks significant faster: ``` Benchmark h2o_window.json ┏━━┳━━━┳

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2122427548 ## datafusion/physical-optimizer/src/wrap_leaves_cancellation.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-02 Thread via GitHub
jonathanc-n commented on code in PR #16234: URL: https://github.com/apache/datafusion/pull/16234#discussion_r2122403231 ## datafusion/physical-expr/src/window/aggregate.rs: ## @@ -85,6 +88,18 @@ impl PlainAggregateWindowExpr { ); } } + +fn is_windo

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-02 Thread via GitHub
jonathanc-n commented on code in PR #16234: URL: https://github.com/apache/datafusion/pull/16234#discussion_r2122403231 ## datafusion/physical-expr/src/window/aggregate.rs: ## @@ -85,6 +88,18 @@ impl PlainAggregateWindowExpr { ); } } + +fn is_windo

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2933011699 > Agreed on the rule approach. > > Interleave will poll each of its children at most once per poll call. If none of the children returns a Ready it will return Pending itse

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2122397197 ## datafusion/physical-plan/src/yield_stream.rs: ## @@ -0,0 +1,235 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2122365680 ## datafusion/physical-plan/src/yield_stream.rs: ## @@ -0,0 +1,235 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2122365680 ## datafusion/physical-plan/src/yield_stream.rs: ## @@ -0,0 +1,235 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [I] Enabling Test "Runtime bloom filter join: do not add bloom filter if dpp filter exists on the same column" fails with IllegalStateException in AdaptiveSparkPlanExec.newQueryStage [datafusion-c

2025-06-02 Thread via GitHub
andygrove commented on issue #1831: URL: https://github.com/apache/datafusion-comet/issues/1831#issuecomment-2932965338 Thanks for the detailed write-up @rishvin. I think this issue may be resolved by https://github.com/apache/datafusion-comet/pull/1811 (cc @coderfender) and we may be able

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2122357369 ## datafusion/physical-optimizer/src/wrap_leaves_cancellation.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

[I] Access Data from S3 in DeltaLake format using Ballista on Kubernetes [datafusion-ballista]

2025-06-02 Thread via GitHub
janbraunsdorff opened a new issue, #1268: URL: https://github.com/apache/datafusion-ballista/issues/1268 **Describe the bug** Hello, I try to remove Spark with Ballista. The Usecase is to read a Delta Table from S3, do some stuff, and wirte it back to S3. Unfortunately I am stuck on t

[I] Enabling Test "Runtime bloom filter join: do not add bloom filter if dpp filter exists on the same column" fails with IllegalStateException in AdaptiveSparkPlanExec.newQueryStage [datafusion-comet

2025-06-02 Thread via GitHub
rishvin opened a new issue, #1831: URL: https://github.com/apache/datafusion-comet/issues/1831 ### Describe the bug The following test cases failed when enabling them (relates to #1739), - Runtime bloom filter join: do not add bloom filter if dpp filter exists on the same colu

Re: [PR] Flatten dependent join [datafusion]

2025-06-02 Thread via GitHub
irenjj closed pull request #16227: Flatten dependent join URL: https://github.com/apache/datafusion/pull/16227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] feat: Translate Hadoop S3A configurations to object_store configurations [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove commented on code in PR #1817: URL: https://github.com/apache/datafusion-comet/pull/1817#discussion_r214172 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadFromS3Suite.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2931946687 Hmm, cool example! I want to understand exactly what is going on. So `InterleaveExec` doesn't return `Pending` to its parent (`AggregateExec`)? -- This is an automated message fr

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2122097178 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -175,307 +135,398 @@ impl ConstExpr { } } +impl PartialEq for ConstExpr { +fn eq(&self, othe

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2122106227 ## datafusion/catalog/src/listing_schema.rs: ## @@ -143,7 +141,7 @@ impl ListingSchemaProvider { order_exprs: vec![],

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
alamb commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2932356312 🤖: Benchmark completed Details ``` group main required-input-ordering -

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
alamb commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2121812355 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -129,23 +138,55 @@ impl PhysicalSortExpr { to_str(&self.options) ) } -} -///

Re: [I] Join on pandas dataframe from python API fails due to schema metadata [datafusion]

2025-06-02 Thread via GitHub
comphead closed issue #15754: Join on pandas dataframe from python API fails due to schema metadata URL: https://github.com/apache/datafusion/issues/15754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Migrate `datafusion-cli` tests to `insta` [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on issue #15795: URL: https://github.com/apache/datafusion/issues/15795#issuecomment-2931705222 Hey @Shreyaskr1409, just checking if you have any questions about the issue or need any help :) -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
alamb commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2932142471 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
alamb commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2932136822 There are some pretty nice looking results in the benchmark results: ``` group main required-input-orderi

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
alamb commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2932076551 🤖: Benchmark completed Details ``` group main required-input-ordering -

Re: [PR] fix: metadata of join schema [datafusion]

2025-06-02 Thread via GitHub
comphead merged PR #16221: URL: https://github.com/apache/datafusion/pull/16221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-02 Thread via GitHub
comphead commented on PR #16203: URL: https://github.com/apache/datafusion/pull/16203#issuecomment-2932086115 @krishvishal are you still planning to wrap this PR up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] build: Specify -Dsbt.log.noformat=true in sbt CI runs [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove merged PR #1822: URL: https://github.com/apache/datafusion-comet/pull/1822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2931986320 Agreed on the rule approach. Interleave will poll each of its children at most once per poll call. If none of the children returns a Ready it will return Pending itself. Each

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2932046007 Hmm, this may be similar in spirit to a related problem sort-preserving merge operator has. I will discuss with @berkaysynnada and think about this a little bit, and circle back.

Re: [PR] chore: Use unique artifact names in Java test run [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove merged PR #1818: URL: https://github.com/apache/datafusion-comet/pull/1818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-06-02 Thread via GitHub
alamb opened a new issue, #16235: URL: https://github.com/apache/datafusion/issues/16235 ### Is your feature request related to a problem or challenge? Tracking ticket for next release, also a place to track desired inclusions Previous release will be https://crates.io/crates/d

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-02 Thread via GitHub
alamb commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2931862394 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2931782511 > Working on it and it seems to get cancelled indeed. I'll work on understanding why and report back. @ozankabak I was able to get another non-exciting plan as follows. It's a

Re: [I] Scalars are too verbose in column name output [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on issue #15395: URL: https://github.com/apache/datafusion/issues/15395#issuecomment-2931763906 I feel like with https://github.com/apache/datafusion/issues/15178 almost done, this ticket can finally be wrapped up ☺️ > The challenge is that it will change the schema

Re: [I] Migrate `core` tests to `insta` [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on issue #15791: URL: https://github.com/apache/datafusion/issues/15791#issuecomment-2931703475 Hey @Chen-Yuan-Lai, just checking if you have any questions about the issue or need any help :) -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Fix: GROUPING SETS accept values without parenthesis [datafusion-sqlparser-rs]

2025-06-02 Thread via GitHub
iffyio merged PR #1867: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1867 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-02 Thread via GitHub
suibianwanwank commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2931626531 FYI @alamb @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-02 Thread via GitHub
jonathanc-n commented on PR #16083: URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2931660916 This should be ready for another review, I've added fuzz tests and fixed up the suggestions cc @Dandandan @ctsk @comphead -- This is an automated message from the Apach

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove commented on code in PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2121737798 ## spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometTPCDSMicroBenchmark.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

Re: [PR] Handle dicts for distinct count [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on PR #15871: URL: https://github.com/apache/datafusion/pull/15871#issuecomment-2931658020 > Is there any way you can add soem slt tests as well Found the exiting test and extended it https://github.com/apache/datafusion/blob/8ed42598eabbf825f101e13e7610f24d8

[PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-02 Thread via GitHub
suibianwanwank opened a new pull request, #16234: URL: https://github.com/apache/datafusion/pull/16234 ## Which issue does this PR close? - Closes #. ## Rationale for this change For unbounded aggregate window functions, the result is the same for all rows within a parti

Re: [I] Improve decimal casting performance [datafusion-comet]

2025-06-02 Thread via GitHub
WordRotator commented on issue #1168: URL: https://github.com/apache/datafusion-comet/issues/1168#issuecomment-2931637233 I'll have a PR up for this shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-02 Thread via GitHub
huaxingao opened a new pull request, #1830: URL: https://github.com/apache/datafusion-comet/pull/1830 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Handle dicts for distinct count [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on code in PR #15871: URL: https://github.com/apache/datafusion/pull/15871#discussion_r2121726273 ## datafusion/functions-aggregate-common/src/aggregate/count_distinct/dict.rs: ## @@ -0,0 +1,70 @@ +// Licensed to the Apache Software Foundation (ASF) under one

Re: [I] Default to collecting statistics when creating LIstingTables [datafusion]

2025-06-02 Thread via GitHub
alamb commented on issue #16158: URL: https://github.com/apache/datafusion/issues/16158#issuecomment-2931622615 Thanks @brayanjuls -- hope all goes well at work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] `sort_fuzz` testing DX improvements [datafusion]

2025-06-02 Thread via GitHub
blaginin opened a new issue, #16233: URL: https://github.com/apache/datafusion/issues/16233 ### Is your feature request related to a problem or challenge? `datafusion/core/tests/fuzz_cases` is a very cool too! It helped to spot https://github.com/apache/datafusion/issues/16228. Howeve

Re: [PR] Exposing FFI to python [datafusion-python]

2025-06-02 Thread via GitHub
renato2099 commented on PR #1137: URL: https://github.com/apache/datafusion-python/pull/1137#issuecomment-2931562071 > Do you think you could enhance this PR to handle python based catalogs and schemas as well? yes, I can try to do so, I will go and read what this is about and come

Re: [PR] [wip] chore: Move Spark version from 4.0.0-preview1 to 4.0.0 [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove closed pull request #1828: [wip] chore: Move Spark version from 4.0.0-preview1 to 4.0.0 URL: https://github.com/apache/datafusion-comet/pull/1828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
pepijnve commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2121676435 ## datafusion/physical-plan/src/yield_stream.rs: ## @@ -0,0 +1,235 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] Handle dicts for distinct count [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on code in PR #15871: URL: https://github.com/apache/datafusion/pull/15871#discussion_r2120757493 ## datafusion/functions-aggregate/src/count.rs: ## @@ -764,4 +774,49 @@ mod tests { assert_eq!(accumulator.evaluate()?, ScalarValue::Int64(Some(0)));

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r2121659097 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1972,41 +1972,85 @@ impl DataFrame { .config_options() .sql_parser .enable_

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r2121659097 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1972,41 +1972,85 @@ impl DataFrame { .config_options() .sql_parser .enable_

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-06-02 Thread via GitHub
blaginin closed pull request #14684: Reuse last projection layer when renaming columns URL: https://github.com/apache/datafusion/pull/14684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-06-02 Thread via GitHub
blaginin commented on PR #14684: URL: https://github.com/apache/datafusion/pull/14684#issuecomment-2931525808 https://github.com/apache/datafusion/pull/14684#discussion_r2121659097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] feat: Translate Hadoop S3A configurations to object_store configurations [datafusion-comet]

2025-06-02 Thread via GitHub
parthchandra commented on PR #1817: URL: https://github.com/apache/datafusion-comet/pull/1817#issuecomment-2931477557 We can address some of the open items in a follow up. Logged: https://github.com/apache/datafusion-comet/issues/1829 -- This is an automated message from the Apache Git

[I] Improve integration of hadoop s3a and comet [datafusion-comet]

2025-06-02 Thread via GitHub
parthchandra opened a new issue, #1829: URL: https://github.com/apache/datafusion-comet/issues/1829 ### What is the problem the feature request solves? https://github.com/apache/datafusion-comet/pull/1817 introduces integration that allows usage of `hadoop-aws`'s `s3a` configuration w

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2931463403 > Agreed, let's add a test for this. I don't think we'll have a problem, but maybe I'm wrong and it is easy to verify. Working on it and it seems to get cancelled. I'll work o

Re: [PR] [wip] chore: Move Spark version from 4.0.0-preview1 to 4.0.0 [datafusion-comet]

2025-06-02 Thread via GitHub
codecov-commenter commented on PR #1828: URL: https://github.com/apache/datafusion-comet/pull/1828#issuecomment-2931443213 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1828?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2931417876 > I did not verify this yet, but it might be worth adding a test case for this type of situation as well. Agreed, let's add a test for this. I don't think there will be a pro

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2931402946 > Inserting it as a parent of leaf nodes, and only when necessary (first item in my message above), gives us a system where the least number of necessary `YieldExec`s are inserted,

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-02 Thread via GitHub
gabotechs commented on code in PR #16195: URL: https://github.com/apache/datafusion/pull/16195#discussion_r2120615931 ## datafusion/physical-plan/src/metrics/value.rs: ## @@ -401,6 +401,90 @@ pub enum MetricValue { StartTimestamp(Timestamp), /// The time at which execu

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-02 Thread via GitHub
gabotechs commented on code in PR #16195: URL: https://github.com/apache/datafusion/pull/16195#discussion_r2120632856 ## datafusion/physical-plan/src/metrics/value.rs: ## @@ -443,6 +528,9 @@ impl MetricValue { .and_then(|ts| ts.timestamp_nanos_opt())

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
pepijnve commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2121550224 ## datafusion/physical-optimizer/src/wrap_leaves_cancellation.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[PR] chore: Move Spark version from 4.0.0-preview1 to 4.0.0 [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove opened a new pull request, #1828: URL: https://github.com/apache/datafusion-comet/pull/1828 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] fix: Fix Spark SQL AQE exchange reuse test failures [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove merged PR #1811: URL: https://github.com/apache/datafusion-comet/pull/1811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] [INTERNAL_ERROR] Custom columnar rules cannot transform shuffle node to something else [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove closed issue #1737: [INTERNAL_ERROR] Custom columnar rules cannot transform shuffle node to something else URL: https://github.com/apache/datafusion-comet/issues/1737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2121457878 ## datafusion/physical-plan/src/yield_stream.rs: ## @@ -0,0 +1,209 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2121456655 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -33,12 +33,12 @@ use std::borrow::Cow; use std::sync::Arc; use std::task::{Context, Poll};

[PR] Add dicts to aggergation fuzzy testing [datafusion]

2025-06-02 Thread via GitHub
blaginin opened a new pull request, #16232: URL: https://github.com/apache/datafusion/pull/16232 ## Which issue does this PR close? Related to https://github.com/apache/datafusion/pull/15871#discussion_r2112273596 ## Rationale for this change Adds dicts support for fuzzy

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2121426764 ## datafusion/physical-plan/src/yield_stream.rs: ## @@ -0,0 +1,209 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2121425029 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -33,12 +33,12 @@ use std::borrow::Cow; use std::sync::Arc; use std::task::{Context, Poll};

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-02 Thread via GitHub
alamb commented on code in PR #16195: URL: https://github.com/apache/datafusion/pull/16195#discussion_r2121402980 ## datafusion/physical-plan/src/metrics/value.rs: ## @@ -344,7 +344,7 @@ impl Drop for ScopedTimerGuard<'_> { /// Among other differences, the metric types have dif

Re: [PR] Chore: implement bit_not as ScalarUDFImpl [datafusion-comet]

2025-06-02 Thread via GitHub
andygrove merged PR #1825: URL: https://github.com/apache/datafusion-comet/pull/1825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Add CatalogProvider API [datafusion-python]

2025-06-02 Thread via GitHub
timsaucer commented on issue #1103: URL: https://github.com/apache/datafusion-python/issues/1103#issuecomment-2931088383 @renato2099 has a very nice PR that addresses the FFI side of this: https://github.com/apache/datafusion-python/pull/1137 Hopefully we can use that as a starting point

Re: [I] Add CatalogProvider and SchemaProvider [datafusion-python]

2025-06-02 Thread via GitHub
timsaucer commented on issue #1091: URL: https://github.com/apache/datafusion-python/issues/1091#issuecomment-2931069743 I am going to close this issue so we aren't discussing it in two places. I think https://github.com/apache/datafusion-python/issues/1103 is the more relevant issue which

Re: [I] Add CatalogProvider and SchemaProvider [datafusion-python]

2025-06-02 Thread via GitHub
timsaucer closed issue #1091: Add CatalogProvider and SchemaProvider URL: https://github.com/apache/datafusion-python/issues/1091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Exposing FFI to python [datafusion-python]

2025-06-02 Thread via GitHub
timsaucer commented on PR #1137: URL: https://github.com/apache/datafusion-python/pull/1137#issuecomment-2931061965 This is great. We had someone open a related issue https://github.com/apache/datafusion-python/issues/1103 and it would be *fantastic* if we could address both of them at onc

[I] Add CI check for documentation build [datafusion-python]

2025-06-02 Thread via GitHub
timsaucer opened a new issue, #1138: URL: https://github.com/apache/datafusion-python/issues/1138 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** There have been a few cases where we have added documentation to either the docs

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-02 Thread via GitHub
pepijnve commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2121323418 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -33,12 +33,12 @@ use std::borrow::Cow; use std::sync::Arc; use std::task::{Context, Poll}; +u

Re: [PR] Add DataFrame API Documentation for DataFusion Python [datafusion-python]

2025-06-02 Thread via GitHub
timsaucer commented on code in PR #1132: URL: https://github.com/apache/datafusion-python/pull/1132#discussion_r2121322305 ## docs/source/api/dataframe.rst: ## @@ -0,0 +1,374 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agree

Re: [PR] Exposing FFI to python [datafusion-python]

2025-06-02 Thread via GitHub
renato2099 commented on PR #1137: URL: https://github.com/apache/datafusion-python/pull/1137#issuecomment-293085 Hi @timsaucer , if you could please take a pass on this PR , that would be great, thanks! -- This is an automated message from the Apache Git Service. To respond to th

  1   2   >