Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-07 Thread via GitHub
rishvin commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2953616585 Hi @andygrove, I tried the following approach and looks like there is some discrepancy in the Datafusion's `SparkSha2` output with Spark. **This is what I attempted**

[PR] Chore: implement predicate exprs as ScalarUDFImpl [datafusion-comet]

2025-06-07 Thread via GitHub
kazantsev-maksim opened a new pull request, #1864: URL: https://github.com/apache/datafusion-comet/pull/1864 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1819 ## Rationale for this change See https://github.com/apache/datafu

Re: [I] Support `DESC ` statement [datafusion]

2025-06-07 Thread via GitHub
hsrahh commented on issue #16311: URL: https://github.com/apache/datafusion/issues/16311#issuecomment-2953633219 I think this feature would be really useful. Right now, when I try to use DESC t1; to see the table schema, it shows an error because DESC is not supported. Some other SQL system

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-07 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2952017708 I merged the latest from main, this is good to go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-06-07 Thread via GitHub
irenjj commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2133648910 ## datafusion/optimizer/src/rewrite_dependent_join.rs: ## @@ -0,0 +1,1901 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-06-07 Thread via GitHub
irenjj commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2133652030 ## datafusion/optimizer/src/rewrite_dependent_join.rs: ## @@ -0,0 +1,1901 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-07 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2952147349 @zhuqi-lucas, I wanted to make a few final finishing touches as we gave a chance in case @alamb wants to take a final look. I changed the config terminology from "frequency" to "pe

Re: [I] Question about the `map_varchar_to_utf8view` config [datafusion]

2025-06-07 Thread via GitHub
xudong963 closed issue #16277: Question about the `map_varchar_to_utf8view` config URL: https://github.com/apache/datafusion/issues/16277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2952297536 Thanks @andygrove and @timsaucer -- this plan looks good to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2952305234 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2952308503 https://github.com/apache/datafusion/actions/runs/15506836203/job/43662848611?pr=16249 > Caused by: process didn't exit successfully: `/home/runner/work/datafusion/datafusi

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2952309229 πŸš€ woohoo! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-07 Thread via GitHub
Dandandan commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2952353060 I added a PR for reverting the changes in arrow-rs https://github.com/apache/arrow-rs/pull/7623 - probably something subtle with one of the fast paths that isn't tested in arrow-rs

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2952354569 > I added a PR for reverting the changes in arrow-rs [apache/arrow-rs#7623](https://github.com/apache/arrow-rs/pull/7623) - probably something subtle with one of the fast paths that is

[PR] Alamb/maybe better metadata [datafusion]

2025-06-07 Thread via GitHub
alamb opened a new pull request, #16317: URL: https://github.com/apache/datafusion/pull/16317 ## Which issue does this PR close? - Related to https://github.com/apache/datafusion/issues/15797 - Follow on to https://github.com/apache/datafusion/pull/16170 - This is an updated

[I] Reduce busy polling when query contains pipeline blocking operators [datafusion]

2025-06-07 Thread via GitHub
pepijnve opened a new issue, #16318: URL: https://github.com/apache/datafusion/issues/16318 ### Is your feature request related to a problem or challenge? When a query pipeline contains one or more pipeline blockers, the query will spend an extended period of time in the blocking phas

Re: [I] Reduce busy polling when query contains pipeline blocking operators [datafusion]

2025-06-07 Thread via GitHub
pepijnve commented on issue #16318: URL: https://github.com/apache/datafusion/issues/16318#issuecomment-2952372827 Creating a PR with a proposed fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-07 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2952376120 > > I will investigate that if we can remove some internal yield logic, such as repartition? etc > > Good idea, I'm curious to see if you can. `RepartitionExec` is a little

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-07 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2952375579 > @zhuqi-lucas, I wanted to make a few final finishing touches as we gave a chance in case @alamb wants to take a final look. I changed the config terminology from "frequency" to

[PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-07 Thread via GitHub
pepijnve opened a new pull request, #16319: URL: https://github.com/apache/datafusion/pull/16319 ## Which issue does this PR close? - Closes #16318. - Relates to #16196 and/or #16301 ## Rationale for this change Yielding to the runtime in Tokio involves unwinding the c

[PR] Unify Metadata Handing: use `FieldMetadata` in `Expr::Alias` and `ExprSchemable` [datafusion]

2025-06-07 Thread via GitHub
alamb opened a new pull request, #16320: URL: https://github.com/apache/datafusion/pull/16320 - Draft until https://github.com/apache/datafusion/pull/16317 is merged ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/16317 ## Ratio

Re: [PR] Encapsulate metadata for literals on to a `FieldMetadata` structure [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16317: URL: https://github.com/apache/datafusion/pull/16317#issuecomment-2952383640 I also found we can further unify the metadata handling for Expr::Alias as well, see - https://github.com/apache/datafusion/pull/16320 -- This is an automated message from the Apa

Re: [PR] Encapsulate metadata for literals on to a `FieldMetadata` structure [datafusion]

2025-06-07 Thread via GitHub
alamb commented on code in PR #16317: URL: https://github.com/apache/datafusion/pull/16317#discussion_r2133768464 ## datafusion/expr/src/expr_rewriter/mod.rs: ## @@ -390,11 +390,7 @@ mod test { } else { utf8_val

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2952385242 I made a PR to main here (no rush on review): - https://github.com/apache/datafusion/pull/16317 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-06-07 Thread via GitHub
alamb commented on code in PR #16207: URL: https://github.com/apache/datafusion/pull/16207#discussion_r2133774454 ## datafusion/expr/src/expr.rs: ## @@ -330,7 +331,7 @@ pub enum Expr { /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt AggregateFunction(Aggregate

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2952394526 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Unify Metadata Handing: use `FieldMetadata` in `Expr::Alias` and `ExprSchemable` [datafusion]

2025-06-07 Thread via GitHub
alamb commented on code in PR #16320: URL: https://github.com/apache/datafusion/pull/16320#discussion_r2133774218 ## datafusion/expr/src/expr.rs: ## @@ -3657,7 +3834,7 @@ mod test { // If this test fails when you change `Expr`, please try // `Box`ing the fields

Re: [PR] Unify Metadata Handing: use `FieldMetadata` in `Expr::Alias` and `ExprSchemable` [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16320: URL: https://github.com/apache/datafusion/pull/16320#issuecomment-2952395132 πŸ€– `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~

[PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-07 Thread via GitHub
mbutrovich opened a new pull request, #1862: URL: https://github.com/apache/datafusion-comet/pull/1862 ## Which issue does this PR close? Closes #458. ## Rationale for this change ## What changes are included in this PR? ## How are these cha

Re: [PR] feat: upgrade df48 dependency [datafusion-python]

2025-06-07 Thread via GitHub
timsaucer commented on PR #1143: URL: https://github.com/apache/datafusion-python/pull/1143#issuecomment-2952430291 TODO (tsaucer): update deprecated interface in rust side, update python bindings, mark deprecated in python as well -- This is an automated message from the Apache Git Serv

Re: [PR] Unify Metadata Handing: use `FieldMetadata` in `Expr::Alias` and `ExprSchemable` [datafusion]

2025-06-07 Thread via GitHub
alamb commented on PR #16320: URL: https://github.com/apache/datafusion/pull/16320#issuecomment-2952450760 πŸ€–: Benchmark completed Details ``` group alamb_field_metadata2 main -

[PR] feat: upgrade df48 dependency [datafusion-python]

2025-06-07 Thread via GitHub
timsaucer opened a new pull request, #1143: URL: https://github.com/apache/datafusion-python/pull/1143 # Which issue does this PR close? Work in progress. Do not merge yet. # Rationale for this change # What changes are included in this PR? # Are there any

[I] Busy-waiting in SortPreservingMergeStream [datafusion]

2025-06-07 Thread via GitHub
pepijnve opened a new issue, #16321: URL: https://github.com/apache/datafusion/issues/16321 ### Describe the bug When running a query like `select a from annotated_data_infinite2 order by b desc limit 10`, a `SortPreservingMergeStream` is created that merge sorts the presorted partit

Re: [I] Busy-waiting in SortPreservingMergeStream [datafusion]

2025-06-07 Thread via GitHub
pepijnve commented on issue #16321: URL: https://github.com/apache/datafusion/issues/16321#issuecomment-2952504834 Investigating, but help appreciated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Busy-waiting in SortPreservingMergeStream [datafusion]

2025-06-07 Thread via GitHub
pepijnve commented on issue #16321: URL: https://github.com/apache/datafusion/issues/16321#issuecomment-2952514406 A closer look shows I might be completely mistaken. Will close if irrelevant. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Encapsulate metadata for literals on to a `FieldMetadata` structure [datafusion]

2025-06-07 Thread via GitHub
timsaucer commented on code in PR #16317: URL: https://github.com/apache/datafusion/pull/16317#discussion_r2133859411 ## datafusion/expr/src/expr.rs: ## @@ -413,6 +413,162 @@ impl<'a> TreeNodeContainer<'a, Self> for Expr { } } +/// Literal metadata +/// +/// Stores metad

[PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-07 Thread via GitHub
pepijnve opened a new pull request, #16322: URL: https://github.com/apache/datafusion/pull/16322 ## Which issue does this PR close? - Closes #16321. ## Rationale for this change `SortPreservingMergeStream` works in two phases. It first waits for each input stream to be r

Re: [PR] chore: Upgrade to DataFusion 48.0.0-rc2 [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove closed pull request #1853: chore: Upgrade to DataFusion 48.0.0-rc2 URL: https://github.com/apache/datafusion-comet/pull/1853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [ignore] Debug regression in 48.0.0-rc2 [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove closed pull request #1855: [ignore] Debug regression in 48.0.0-rc2 URL: https://github.com/apache/datafusion-comet/pull/1855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[PR] chore: Upgrade to DataFusion 48.0.0-rc3 [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove opened a new pull request, #1863: URL: https://github.com/apache/datafusion-comet/pull/1863 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] chore: Upgrade to DataFusion 48.0.0-rc3 [datafusion-comet]

2025-06-07 Thread via GitHub
codecov-commenter commented on PR #1863: URL: https://github.com/apache/datafusion-comet/pull/1863#issuecomment-2952624559 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1863?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Request to update crates.io ownership [datafusion]

2025-06-07 Thread via GitHub
alamb opened a new issue, #16323: URL: https://github.com/apache/datafusion/issues/16323 The ownership of the main datafusion on crates.io https://crates.io/crates/datafusion Should match the ownership of all subcrates. I was trying to add @xudong963 as owner to the datafusion crates so

Re: [I] Request to update crates.io ownership [datafusion]

2025-06-07 Thread via GitHub
andygrove commented on issue #16323: URL: https://github.com/apache/datafusion/issues/16323#issuecomment-2952664869 @alamb I have now sent invites for these crates -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Request to update crates.io ownership [datafusion]

2025-06-07 Thread via GitHub
alamb closed issue #16323: Request to update crates.io ownership URL: https://github.com/apache/datafusion/issues/16323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Request to update crates.io ownership [datafusion]

2025-06-07 Thread via GitHub
alamb commented on issue #16323: URL: https://github.com/apache/datafusion/issues/16323#issuecomment-2952666391 Thank you, I got them. πŸ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Update or ignore tests in Spark SQL WholeStageCodegenSuite [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove closed issue #1852: Update or ignore tests in Spark SQL WholeStageCodegenSuite URL: https://github.com/apache/datafusion-comet/issues/1852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] chore: Ignore Spark SQL WholeStageCodegenSuite tests [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove merged PR #1859: URL: https://github.com/apache/datafusion-comet/pull/1859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: Update broadcast exchange logic to support reused exchanges [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove commented on PR #1858: URL: https://github.com/apache/datafusion-comet/pull/1858#issuecomment-2952704690 This PR is no longer needed now that the diff in https://github.com/apache/datafusion-comet/pull/1736 is much smaller -- This is an automated message from the Apache Git Ser

Re: [PR] fix: Update broadcast exchange logic to support reused exchanges [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove closed pull request #1858: fix: Update broadcast exchange logic to support reused exchanges URL: https://github.com/apache/datafusion-comet/pull/1858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Invalid argument error: Invalid arithmetic operation: Int32 - Int64 [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove closed issue #1246: Invalid argument error: Invalid arithmetic operation: Int32 - Int64 URL: https://github.com/apache/datafusion-comet/issues/1246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] Invalid argument error: Invalid arithmetic operation: Int32 - Int64 [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove commented on issue #1246: URL: https://github.com/apache/datafusion-comet/issues/1246#issuecomment-2952705335 Fixed in https://github.com/apache/datafusion-comet/pull/1848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-06-07 Thread via GitHub
zhuqi-lucas commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2952713037 Do some experiment in: https://github.com/apache/arrow-rs/pull/7624 It looks like the result is not bad, run 2 times, need to check it again: ```rust c

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Duration [datafusion]

2025-06-07 Thread via GitHub
jkosh44 commented on issue #16285: URL: https://github.com/apache/datafusion/issues/16285#issuecomment-2952715421 The arrow C++ substrait library also doesn't support Durations, but they have a comment about using UDTs to support them: https://github.com/apache/arrow/blob/1d169cc90f65be6ee0

Re: [PR] Feat: Support `map`, `map_keys` & `maps_values` [datafusion-comet]

2025-06-07 Thread via GitHub
comphead closed pull request #1236: Feat: Support `map`, `map_keys` & `maps_values` URL: https://github.com/apache/datafusion-comet/pull/1236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Feat: Support `map`, `map_keys` & `maps_values` [datafusion-comet]

2025-06-07 Thread via GitHub
comphead commented on PR #1236: URL: https://github.com/apache/datafusion-comet/pull/1236#issuecomment-2952735576 Closing as those functions already implemented -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]

2025-06-07 Thread via GitHub
jkosh44 commented on issue #16248: URL: https://github.com/apache/datafusion/issues/16248#issuecomment-2952747673 Most of the cast errors have the same cause, they are trying to cast a type from Arrow that doesn't exist in substrait. - https://github.com/apache/datafusion/issues/16275

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-07 Thread via GitHub
codecov-commenter commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952754584 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1862?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-06-07 Thread via GitHub
adriangb commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2952754881 Looks like not a big difference to me? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] upgraded spark 3.5.5 to 3.5.6 [datafusion-comet]

2025-06-07 Thread via GitHub
codecov-commenter commented on PR #1861: URL: https://github.com/apache/datafusion-comet/pull/1861#issuecomment-2952756479 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1861?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] Migrate core test to insta, part1 [datafusion]

2025-06-07 Thread via GitHub
Chen-Yuan-Lai opened a new pull request, #16324: URL: https://github.com/apache/datafusion/pull/16324 ## Which issue does this PR close? - Closes #15791 . ## Rationale for this change ## What changes are included in this PR? ## Are these cha

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-07 Thread via GitHub
Dandandan commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2952801442 @alamb benchmark runs ok now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952814148 I ran TPC-H benchmarks and saw shuffles with range partitioning run natively. I did not see any difference in performance compared to the last set of benchmarks I ran some tim

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-06-07 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2134056572 ## datafusion/optimizer/src/rewrite_dependent_join.rs: ## @@ -0,0 +1,1901 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-06-07 Thread via GitHub
kevinjqliu commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2952854828 This is great, thanks @clflushopt I couldn't find a way to use datafusion to write multiple parquet files, but i think this is a limitation with datafusion's `COPY` co

[PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-07 Thread via GitHub
theirix opened a new pull request, #16325: URL: https://github.com/apache/datafusion/pull/16325 ## Which issue does this PR close? - Closes #13563 ## Rationale for this change Explained in #13563 in detail with known syntax examples. Thanks to [changes to sqlparser](h

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-06-07 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2134056643 ## datafusion/optimizer/src/rewrite_dependent_join.rs: ## @@ -0,0 +1,1901 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [I] Support reading multiple parquet files via `datafusion-cli` [datafusion]

2025-06-07 Thread via GitHub
a-agmon commented on issue #16303: URL: https://github.com/apache/datafusion/issues/16303#issuecomment-2952859549 @alamb - I'm less familiar with this area in datafusion but might be able to give this a shot. The idea is to add this as a table function right? I can see that `ListingT

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-07 Thread via GitHub
mbutrovich commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952887778 > I ran TPC-H benchmarks and saw shuffles with range partitioning run natively. I did not see any difference in performance compared to the last set of benchmarks I ran some

Re: [PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-07 Thread via GitHub
pepijnve commented on PR #16322: URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2953125280 A sort preserving merge specific test case started failing. I’ll dig deeper to better understand what’s going on. -- This is an automated message from the Apache Git Service. To r

[I] DF 48 upgrade guide missing window function breaking change [datafusion]

2025-06-07 Thread via GitHub
Omega359 opened a new issue, #16326: URL: https://github.com/apache/datafusion/issues/16326 ### Is your feature request related to a problem or challenge? `Expr::WindowFunction` prior to DF 48 accepted a `WindowFunction`, now requires a `Box` ### Describe the solution you'd

[I] to_hex cannot take UInt64 [datafusion]

2025-06-07 Thread via GitHub
drtconway opened a new issue, #16327: URL: https://github.com/apache/datafusion/issues/16327 ### Describe the bug I have a UInt64 column containing a 64-bit hash which I want to convert to a hex string. `to_hex` should work, but gives the error: ``` Error: Custom { kind: Ot

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-06-07 Thread via GitHub
zhuqi-lucas commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2953460233 > Looks like not a big difference to me? Some queries has 30% peformance improvement, will try to mock this in real datafusion benchmark. ``` arrow_reader_clickb

Re: [PR] Fix inconsistent schema projection in ListingTable when file order varies by tracking schema source [datafusion]

2025-06-07 Thread via GitHub
kosiew commented on code in PR #16305: URL: https://github.com/apache/datafusion/pull/16305#discussion_r2134362206 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -2452,4 +2178,381 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn infer_preser

Re: [PR] Fix inconsistent schema projection in ListingTable when file order varies by tracking schema source [datafusion]

2025-06-07 Thread via GitHub
kosiew commented on code in PR #16305: URL: https://github.com/apache/datafusion/pull/16305#discussion_r2134362818 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -2452,4 +2178,381 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn infer_preser

Re: [PR] Fix inconsistent schema projection in ListingTable when file order varies by tracking schema source [datafusion]

2025-06-07 Thread via GitHub
kosiew commented on code in PR #16305: URL: https://github.com/apache/datafusion/pull/16305#discussion_r2134363121 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -2452,4 +2178,381 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn infer_preser

Re: [PR] Fix inconsistent schema projection in ListingTable when file order varies by tracking schema source [datafusion]

2025-06-07 Thread via GitHub
kosiew commented on code in PR #16305: URL: https://github.com/apache/datafusion/pull/16305#discussion_r2134364184 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -2452,4 +2178,382 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn infer_preser

Re: [PR] Fix inconsistent schema projection in ListingTable when file order varies by tracking schema source [datafusion]

2025-06-07 Thread via GitHub
kosiew commented on code in PR #16305: URL: https://github.com/apache/datafusion/pull/16305#discussion_r2134362206 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -2452,4 +2178,381 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn infer_preser