Re: [PR] Add support for `ALTER TABLE DROP INDEX` [datafusion-sqlparser-rs]

2025-06-09 Thread via GitHub
iffyio commented on PR #1865: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1865#issuecomment-2957664809 @vimko could you take a look at the cargo failure when you get the chance? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2957658585 > @zhuqi-lucas, talking about TODO items, in addition to the 4 things I noted in [my comment above](https://github.com/apache/datafusion/pull/16196#issuecomment-2955853539), I s

Re: [I] `Span` for `Expr::Case` does not include the heading and trailing keywords [datafusion-sqlparser-rs]

2025-06-09 Thread via GitHub
eliaperantoni commented on issue #1878: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1878#issuecomment-2957883103 @alamb should be added to tracker ticket #1548 so people can find this and the related PR for inspiration -- This is an automated message from the Apache Git

Re: [PR] Fix array_concat with NULL arrays [datafusion]

2025-06-09 Thread via GitHub
gabotechs commented on code in PR #16348: URL: https://github.com/apache/datafusion/pull/16348#discussion_r2136979543 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -3070,6 +3070,30 @@ select array_concat([]); [] +# test with NULL array +query ? +select array_co

Re: [PR] Fix array_concat with NULL arrays [datafusion]

2025-06-09 Thread via GitHub
gabotechs commented on code in PR #16348: URL: https://github.com/apache/datafusion/pull/16348#discussion_r2136961999 ## datafusion/functions-nested/src/concat.rs: ## @@ -361,15 +362,44 @@ pub(crate) fn array_concat_inner(args: &[ArrayRef]) -> Result { for arg in args {

Re: [PR] Add support for `CREATE SCHEMA WITH ( )` [datafusion-sqlparser-rs]

2025-06-09 Thread via GitHub
iffyio merged PR #1877: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Fix inconsistent schema projection in ListingTable even when schema is specified [datafusion]

2025-06-09 Thread via GitHub
xudong963 merged PR #16305: URL: https://github.com/apache/datafusion/pull/16305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Fix: mark "Spilling (to disk) Joins" as supported in features [datafusion]

2025-06-09 Thread via GitHub
kosiew commented on code in PR #16343: URL: https://github.com/apache/datafusion/pull/16343#discussion_r2136859567 ## docs/source/user-guide/features.md: ## @@ -93,7 +93,7 @@ - [x] Memory limits enforced - [x] Spilling (to disk) Sort - [x] Spilling (to disk) Grouping -- [ ] S

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2957652941 > So maybe we need a follow on PR to fix the cancel test from @pepijnve Thank you @alamb. @pepijnve can you add a testing case that this PR will not succeed? I remember i w

Re: [PR] fix: create file for empty stream [datafusion]

2025-06-09 Thread via GitHub
mmooyyii commented on PR #16342: URL: https://github.com/apache/datafusion/pull/16342#issuecomment-2957479062 Maybe add same test for write_parquet and write_json? I think they should have same behavior. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136730391 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1259,9 +1302,14 @@ impl FileSink for ParquetSink { object_store: Arc, ) -> Result {

Re: [I] Panic in `datafusion_expr::window_state::WindowAggState::update` [datafusion]

2025-06-09 Thread via GitHub
suibianwanwank commented on issue #16308: URL: https://github.com/apache/datafusion/issues/16308#issuecomment-2957575687 > FYI [@suibianwanwank](https://github.com/suibianwanwank) would you be willing to take this issue? Sure, I'd be happy to take a look. Things have been a bit busy o

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub
Chen-Yuan-Lai commented on PR #16324: URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2957481921 > Just FYI, if you expect more work on this issue, you may want to change "closes" to "part of" - because otherwise github will actually close the issue once this PR is merged

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2957688586 > @ozankabak indeed, that was what my original test was simulating. The coalesce batches and repartition end up erasing the scenario I was trying to demonstrate. I fully agree th

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2957679295 FYI Thank you @ozankabak @alamb @pepijnve , i created a EPIC ticket now, currently i added 5 sub-tasks, feel free to add more tasks, we can iterator all possible cases and mak

Re: [PR] Fix: mark "Spilling (to disk) Joins" as supported in features [datafusion]

2025-06-09 Thread via GitHub
xudong963 merged PR #16343: URL: https://github.com/apache/datafusion/pull/16343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136710028 ## benchmarks/src/bin/dfbench.rs: ## @@ -60,11 +60,11 @@ pub async fn main() -> Result<()> { Options::Cancellation(opt) => opt.run().await, Opt

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-09 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2957654244 @berkaysynnada I think I made some progress. I was able to get the TopK pushdown tests passing by passing around the filter in the `EnforceSorting` rule as is already done for the `

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-09 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2136896235 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -114,6 +118,18 @@ fn pushdown_sorts_helper( sort_push_down.data.fetch =

Re: [I] Update Feature Checklist: “Spilling (to disk) Joins” is Implemented [datafusion]

2025-06-09 Thread via GitHub
xudong963 closed issue #16341: Update Feature Checklist: “Spilling (to disk) Joins” is Implemented URL: https://github.com/apache/datafusion/issues/16341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136714217 ## datafusion/common/src/config.rs: ## @@ -188,6 +195,338 @@ macro_rules! config_namespace { } } +#[derive(Clone, Default, Debug, PartialEq)] +pub struct

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2957368512 @adamreeve @rok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Inconsistent schema coercion in `ListingTableConfig` [datafusion]

2025-06-09 Thread via GitHub
xudong963 closed issue #16270: Inconsistent schema coercion in `ListingTableConfig` URL: https://github.com/apache/datafusion/issues/16270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-09 Thread via GitHub
akurmustafa commented on PR #74: URL: https://github.com/apache/datafusion-site/pull/74#issuecomment-2957563464 Thanks @timsaucer for the reviews and pointing out the broken links. @alamb I resolved the link issues by resorting html instead of relying on the markdown rendering. I sent the c

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136732397 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1259,9 +1302,14 @@ impl FileSink for ParquetSink { object_store: Arc, ) -> Result {

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136733767 ## datafusion/common/src/config.rs: ## @@ -188,6 +195,338 @@ macro_rules! config_namespace { } } +#[derive(Clone, Default, Debug, PartialEq)] +pub struct

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136718671 ## datafusion/common/src/config.rs: ## @@ -591,6 +930,12 @@ config_namespace! { /// writing out already in-memory data, such as from a cached /

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136721651 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1259,9 +1302,14 @@ impl FileSink for ParquetSink { object_store: Arc, ) -> Result {

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136718671 ## datafusion/common/src/config.rs: ## @@ -591,6 +930,12 @@ config_namespace! { /// writing out already in-memory data, such as from a cached /

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2136715685 ## datafusion/common/src/config.rs: ## @@ -188,6 +195,338 @@ macro_rules! config_namespace { } } +#[derive(Clone, Default, Debug, PartialEq)] +pub struct

[PR] feat: Parquet modular encryption [datafusion]

2025-06-09 Thread via GitHub
corwinjoy opened a new pull request, #16351: URL: https://github.com/apache/datafusion/pull/16351 ## Which issue does this PR close? - Closes #15216. ## What changes are included in this PR? This PR adds support for encryption in DataFusion’s Parquet implementation.

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2957302218 > Reflecting all this onto the task at hand, this PR (1) solves many cases already and (2) introduces some machinery that will be useful as we iterate on the full solution. I don't

Re: [I] to_hex cannot take UInt64 [datafusion]

2025-06-09 Thread via GitHub
drtconway commented on issue #16327: URL: https://github.com/apache/datafusion/issues/16327#issuecomment-2957231379 Awesome! Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] array_concat fails with when passed NULL list literals [datafusion]

2025-06-09 Thread via GitHub
alexanderbianchi commented on issue #16349: URL: https://github.com/apache/datafusion/issues/16349#issuecomment-2957199931 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] fix: Fix SparkSha2 to be compliant with Spark response and add support for Int32 [datafusion]

2025-06-09 Thread via GitHub
rishvin commented on PR #16350: URL: https://github.com/apache/datafusion/pull/16350#issuecomment-2957119580 @andygrove here are the SHA2 changes to be compliant with Spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Incorrect results with JVM shuffle: Spark SQL `- SPARK-32038: NormalizeFloatingNumbers should work on distinct aggregate` [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove closed issue #1824: Incorrect results with JVM shuffle: Spark SQL `- SPARK-32038: NormalizeFloatingNumbers should work on distinct aggregate` URL: https://github.com/apache/datafusion-comet/issues/1824 -- This is an automated message from the Apache Git Service. To respond to the me

[PR] Fix array_concat with NULL arrays [datafusion]

2025-06-09 Thread via GitHub
alexanderbianchi opened a new pull request, #16348: URL: https://github.com/apache/datafusion/pull/16348 Fixes issue where 'select array_concat(NULL::integer[])' would throw 'Arrow error: Compute error: concat requires input of at least one array' ## Which issue does this PR close?

Re: [PR] fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove merged PR #1865: URL: https://github.com/apache/datafusion-comet/pull/1865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Spark SQL test "test with low buffer spill threshold" fails when shuffle mode is jvm [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove closed issue #1252: Spark SQL test "test with low buffer spill threshold" fails when shuffle mode is jvm URL: https://github.com/apache/datafusion-comet/issues/1252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Support datafusion-cli access to public S3 buckets that do not require authentication [datafusion]

2025-06-09 Thread via GitHub
blaginin commented on code in PR #16300: URL: https://github.com/apache/datafusion/pull/16300#discussion_r2136400344 ## datafusion-cli/src/object_storage.rs: ## @@ -105,9 +106,52 @@ pub async fn get_s3_object_store_builder( builder = builder.with_allow_http(*allow_http)

Re: [I] Incorrect count `null` in dict values [datafusion]

2025-06-09 Thread via GitHub
blaginin closed issue #16228: Incorrect count `null` in dict values URL: https://github.com/apache/datafusion/issues/16228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] fix: Fix SparkSha2 to be compliant with Spark response and add support for Int32 [datafusion]

2025-06-09 Thread via GitHub
rishvin opened a new pull request, #16350: URL: https://github.com/apache/datafusion/pull/16350 ## Which issue does this PR close? - Closes #16336 ## Rationale for this change Please see [here](https://github.com/apache/datafusion-comet/issues/1820#issuecomm

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub
blaginin commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2136447543 ## datafusion/core/tests/optimizer/mod.rs: ## @@ -107,16 +126,15 @@ fn concat_ws_literals() -> Result<()> { let sql = "SELECT concat_ws('-', true, col_int32,

[I] array_concat fails with when passed NULL list literals [datafusion]

2025-06-09 Thread via GitHub
alexanderbianchi opened a new issue, #16349: URL: https://github.com/apache/datafusion/issues/16349 ### Describe the bug The array_concat function throws an Arrow compute error `concat requires input of at least one array` when called with NULL list literals (e.g., NULL::integer[]).

[PR] Draft: feat: Add FFI support for user defined functions [datafusion-python]

2025-06-09 Thread via GitHub
timsaucer opened a new pull request, #1145: URL: https://github.com/apache/datafusion-python/pull/1145 **This builds on top of https://github.com/apache/datafusion-python/pull/1143 and I will rebase once that merges in to main.** # Which issue does this PR close? Closes https:

Re: [I] Add CI check for documentation build [datafusion-python]

2025-06-09 Thread via GitHub
timsaucer closed issue #1138: Add CI check for documentation build URL: https://github.com/apache/datafusion-python/issues/1138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2957023096 I ran fresh benchmarks, but I do not see any change in performance. Perhaps the range partitioning shuffles are not a significant cost in these benchmarks. -- This is an aut

Re: [PR] Add a documentation build step in CI [datafusion-python]

2025-06-09 Thread via GitHub
timsaucer merged PR #1139: URL: https://github.com/apache/datafusion-python/pull/1139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Fix distinct count for DictionaryArray to correctly account for nulls in values array [datafusion]

2025-06-09 Thread via GitHub
blaginin merged PR #16258: URL: https://github.com/apache/datafusion/pull/16258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub
blaginin commented on PR #16324: URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2956983950 Also could you please resolve the conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub
blaginin commented on PR #16324: URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2956908455 Hey @Chen-Yuan-Lai! In the PR title you have > part1 but in the PR body you write > Closes https://github.com/apache/datafusion/issues/15791 Just

Re: [I] COUNT and COUNT DISTINCT produce incorrect results for dictionary arrays with null values [datafusion]

2025-06-09 Thread via GitHub
blaginin closed issue #16339: COUNT and COUNT DISTINCT produce incorrect results for dictionary arrays with null values URL: https://github.com/apache/datafusion/issues/16339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[I] Panic in FFI UDWF when using wrapping lead function [datafusion-python]

2025-06-09 Thread via GitHub
timsaucer opened a new issue, #1144: URL: https://github.com/apache/datafusion-python/issues/1144 **Describe the bug** During testing of the initial implementation of FFI user defined window functions, I generated a panic in `partition_evaluator`. **To Reproduce** Open `

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on PR #16319: URL: https://github.com/apache/datafusion/pull/16319#issuecomment-2956628323 Looking at the `clickbench_partitioned` outliers. Wrt the code changes in this PR they seem pretty similar yet one has basically the opposite result of the other. What's interesting

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on PR #16319: URL: https://github.com/apache/datafusion/pull/16319#issuecomment-2956607445 I had a look at `clickbench_extended`. I cannot explain the slowdown. Those queries do not even use sorting or joins. The plan for the first one for instance is ``` Aggreg

Re: [PR] Example for using a separate threadpool for CPU bound work (try 3) [datafusion]

2025-06-09 Thread via GitHub
alamb commented on code in PR #16331: URL: https://github.com/apache/datafusion/pull/16331#discussion_r2136243498 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,346 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on code in PR #16319: URL: https://github.com/apache/datafusion/pull/16319#discussion_r2136197682 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1126,14 +1127,20 @@ impl ExecutionPlan for SortExec { Ok(Box::pin(RecordBatchStreamAdapter:

Re: [PR] fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove commented on PR #1865: URL: https://github.com/apache/datafusion-comet/pull/1865#issuecomment-2956594650 Thanks for the reviews @parthchandra and @Kontinuation. I will need to rebase this PR and update the 3.5.6 diff now that https://github.com/apache/datafusion-comet/pull/1861 i

Re: [I] Upgrade to spark-3.5.6 [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove closed issue #1857: Upgrade to spark-3.5.6 URL: https://github.com/apache/datafusion-comet/issues/1857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] upgraded spark 3.5.5 to 3.5.6 [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove merged PR #1861: URL: https://github.com/apache/datafusion-comet/pull/1861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on code in PR #16319: URL: https://github.com/apache/datafusion/pull/16319#discussion_r2136240231 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1126,14 +1127,20 @@ impl ExecutionPlan for SortExec { Ok(Box::pin(RecordBatchStreamAdapter:

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on code in PR #16319: URL: https://github.com/apache/datafusion/pull/16319#discussion_r2136197682 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1126,14 +1127,20 @@ impl ExecutionPlan for SortExec { Ok(Box::pin(RecordBatchStreamAdapter:

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2956513724 Great, thanks for the patch. We should use it as one of the new test cases in the follow-on PRs. Look, I see that you are trying to help and we do want to take it. I suspect

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on PR #16319: URL: https://github.com/apache/datafusion/pull/16319#issuecomment-2956496642 @alamb I've been trying to make sense of what to do with the benchmark results. They always seem to give me very mixed results when I run them locally (that's part of why I did the

Re: [PR] Metadata handling announcement [datafusion-site]

2025-06-09 Thread via GitHub
timsaucer commented on PR #73: URL: https://github.com/apache/datafusion-site/pull/73#issuecomment-2956377850 > I have some optional example suggestions...I'm happy to help put some together! Is there a target date you're hoping to have this completed by? I don't think there's any rus

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on code in PR #16319: URL: https://github.com/apache/datafusion/pull/16319#discussion_r2136181096 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1126,14 +1127,20 @@ impl ExecutionPlan for SortExec { Ok(Box::pin(RecordBatchStreamAdapter:

Re: [PR] feat: pass ignore_nulls flag to first and last [datafusion-comet]

2025-06-09 Thread via GitHub
codecov-commenter commented on PR #1866: URL: https://github.com/apache/datafusion-comet/pull/1866#issuecomment-2956336470 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1866?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA [datafusion]

2025-06-09 Thread via GitHub
GitHub user kylebarron added a comment to the discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA As long as I'm in town I'll attend! I might be able to present on geospatial support for DataFusion, but it's in pretty early stages and who knows what state it'll be in by August/Sept

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2956355078 @ozankabak indeed, that was what my original test was simulating. The coalesce batches and repartition end up erasing the scenario I was trying to demonstrate. I fully agree that th

Re: [PR] doc: Add SQL examples for SEMI + ANTI Joins [datafusion]

2025-06-09 Thread via GitHub
jonathanc-n commented on code in PR #16316: URL: https://github.com/apache/datafusion/pull/16316#discussion_r2136098201 ## docs/source/user-guide/sql/select.md: ## @@ -170,18 +170,72 @@ select * from x natural join x y; ### CROSS JOIN -A cross join produces a cartesian prod

Re: [PR] Example for using a separate threadpool for CPU bound work (try 3) [datafusion]

2025-06-09 Thread via GitHub
Omega359 commented on code in PR #16331: URL: https://github.com/apache/datafusion/pull/16331#discussion_r2136094718 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,350 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] feat: pass ignore_nulls flag to first and last [datafusion-comet]

2025-06-09 Thread via GitHub
parthchandra commented on PR #1866: URL: https://github.com/apache/datafusion-comet/pull/1866#issuecomment-2956276128 The linked PR did not add a test. Would you be able to? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Metadata handling announcement [datafusion-site]

2025-06-09 Thread via GitHub
paleolimbot commented on code in PR #73: URL: https://github.com/apache/datafusion-site/pull/73#discussion_r2135925472 ## content/blog/2025-06-09-metadata-handling.md: ## @@ -0,0 +1,98 @@ +--- +layout: post +title: Metadata handling in user defined functions +date: 2025-06-09 +a

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2956234465 @zhuqi-lucas, talking about TODO items, in addition to the 4 things I noted in [my comment above](https://github.com/apache/datafusion/pull/16196#issuecomment-2955853539), I sugge

Re: [I] Iceberg integration - parquet-column version conflicts [datafusion-comet]

2025-06-09 Thread via GitHub
parthchandra commented on issue #1833: URL: https://github.com/apache/datafusion-comet/issues/1833#issuecomment-2956234351 The issue is not specific to a Spark or Parquet version I believe. The issue is that Comet has Parquet's `ColumnDescriptor` class (and some other related classes) i

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-09 Thread via GitHub
mbutrovich commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2135992681 ## native/core/benches/shuffle_writer.rs: ## @@ -42,20 +45,18 @@ fn criterion_benchmark(c: &mut Criterion) { CompressionCodec::Zstd(1), Co

Re: [PR] fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [datafusion-comet]

2025-06-09 Thread via GitHub
parthchandra commented on PR #1865: URL: https://github.com/apache/datafusion-comet/pull/1865#issuecomment-2956180483 For the cases where we were falling back to columnar, the tests now fail (and are ignored), or are we falling back to Spark? -- This is an automated message from the Apac

Re: [PR] chore: refactor Substrait consumer's "rename_field" and implement the rest of types [datafusion]

2025-06-09 Thread via GitHub
Blizzara commented on PR #16345: URL: https://github.com/apache/datafusion/pull/16345#issuecomment-2956189590 Fyi @westonpace who originally split out the rename_field method, and @alamb :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove commented on PR #1865: URL: https://github.com/apache/datafusion-comet/pull/1865#issuecomment-2956189870 > For the cases where we were falling back to columnar, the tests now fail (and are ignored), or are we falling back to Spark? Previously, we were falling back to Spark

Re: [I] Update or ignore tests in Spark SQL WholeStageCodegenSuite [datafusion-comet]

2025-06-09 Thread via GitHub
parthchandra commented on issue #1852: URL: https://github.com/apache/datafusion-comet/issues/1852#issuecomment-2956166017 Yes, these can be ignored. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2956129436 > However, I do like a bias of action, and if this PR fixes a real problem, I don't think we should bikeshed it indefinitely @alamb sorry if it came across as bike shedding; I

Re: [PR] Fix inconsistent schema projection in ListingTable even when schema is specified [datafusion]

2025-06-09 Thread via GitHub
kosiew commented on PR #16305: URL: https://github.com/apache/datafusion/pull/16305#issuecomment-299071 Thanks @xudong963 for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] Add support for `CREATE SCHEMA x WITH ( )` [datafusion-sqlparser-rs]

2025-06-09 Thread via GitHub
utay opened a new pull request, #1877: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1877 This PR adds support for `CREATE SCHEMA x WITH ( LOCATION = '/some/path' )` by some dialects such as Trino. See documentation at https://trino.io/docs/current/sql/create-schema.htm

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-09 Thread via GitHub
kylebarron commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2956073476 In https://github.com/geoarrow/geoarrow-rs/pull/1179 I validated that this PR fixes the failing test I encountered from https://github.com/geoarrow/geoarrow-rs/pull/1106 -- Thi

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2955938360 Is that the interleave test? Sorry the link was not clear -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] June 2025 ASF Board Report [datafusion]

2025-06-09 Thread via GitHub
kevinjqliu commented on issue #15182: URL: https://github.com/apache/datafusion/issues/15182#issuecomment-2956022595 I gave it a read, LGTM Couple of suggestions. For python, [udf and udaf was added](https://github.com/apache/datafusion-python/pull/1040) recently as well as [udw

Re: [PR] fix: Avoid delim get if no correlated columns [datafusion]

2025-06-09 Thread via GitHub
duongcongtoai closed pull request #16344: fix: Avoid delim get if no correlated columns URL: https://github.com/apache/datafusion/pull/16344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2955983864 BTW if you wanted to check whether this PR covers interleave-related cases (which is the test case analyzed in that repo), this PR has two tests for it (`test_infinite_interleave_c

Re: [PR] Example for using a separate threadpool for CPU bound work (try 3) [datafusion]

2025-06-09 Thread via GitHub
alamb commented on code in PR #16331: URL: https://github.com/apache/datafusion/pull/16331#discussion_r2135821535 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,346 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Add test for ordering of predicate pushdown into parquet [datafusion]

2025-06-09 Thread via GitHub
adriangb merged PR #16169: URL: https://github.com/apache/datafusion/pull/16169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2955961098 His repo seems to be creating a plan manually and applying some old version of the rule (which is in that repo, not in DF proper). What are we trying to do here? Am I missing somet

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2955945948 > Is that the interleave test? Sorry the link was not clear Sorry -- it was this reproducer https://github.com/pepijnve/datafusion_cancel_test I pointed it at a local

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2955925839 So maybe we need a follow on PR to fix the cancel test from @pepijnve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-09 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2955922212 So I just re-ran the reproducer from @pepijnve in https://github.com/apache/datafusion/pull/16196#issuecomment-2921143644 against main and this PR doesn't cancel it: ```

Re: [PR] Example for using a separate threadpool for CPU bound work (try 3) [datafusion]

2025-06-09 Thread via GitHub
Omega359 commented on code in PR #16331: URL: https://github.com/apache/datafusion/pull/16331#discussion_r2135785917 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,346 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] feat: support FixedSizeList for array_has [datafusion]

2025-06-09 Thread via GitHub
alamb commented on code in PR #16333: URL: https://github.com/apache/datafusion/pull/16333#discussion_r2135784803 ## datafusion/functions-nested/src/array_has.rs: ## @@ -232,98 +236,244 @@ fn array_has_inner_for_array(haystack: &ArrayRef, needle: &ArrayRef) -> Result array_has_

Re: [PR] fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove commented on PR #1865: URL: https://github.com/apache/datafusion-comet/pull/1865#issuecomment-2955890140 @rluvaton @Kontinuation fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: `PushDownFilter` for `GROUP BY` on uppercase col names [datafusion]

2025-06-09 Thread via GitHub
alamb commented on PR #16049: URL: https://github.com/apache/datafusion/pull/16049#issuecomment-2955874691 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is read

Re: [PR] Fix array_agg memory over accounting [datafusion]

2025-06-09 Thread via GitHub
LiaCastaneda commented on code in PR #16346: URL: https://github.com/apache/datafusion/pull/16346#discussion_r2135765414 ## datafusion/common/src/scalar/mod.rs: ## @@ -3525,6 +3525,12 @@ impl ScalarValue { } } } + +/// Compacts ([ScalarValue::compa

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub
alamb commented on PR #16324: URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2955861986 Thank you @Chen-Yuan-Lai -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

  1   2   >