Re: [PR] Fix internal error in sort when hitting memory limit [datafusion]

2025-04-12 Thread via GitHub
DerGut commented on code in PR #15692: URL: https://github.com/apache/datafusion/pull/15692#discussion_r2040700866 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -765,6 +765,25 @@ impl ExternalSorter { Ok(()) } + +/// Reserves memory to be able to accom

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-12 Thread via GitHub
timsaucer commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798792530 I spoke too soon - I'm getting one error in our unit tests on `last_value`. I'm trying to investigate this morning. -- This is an automated message from the Apache Git Servi

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-12 Thread via GitHub
2010YOUY01 commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2798787828 Benchmark results: (I think there is no significant regression for an extra round of re-spill, if it's running on a machine with fast SSDs) ### Environment MacBook Pro

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-12 Thread via GitHub
2010YOUY01 commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2798789296 I have made the following updates: - Address review comments - Introduced a new configuration option for max merge degree - Add tests It's ready for another look.

Re: [PR] fix: normalize window ident [datafusion]

2025-04-12 Thread via GitHub
alamb merged PR #15639: URL: https://github.com/apache/datafusion/pull/15639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support window definitions with aliases [datafusion]

2025-04-12 Thread via GitHub
alamb closed issue #15605: Support window definitions with aliases URL: https://github.com/apache/datafusion/issues/15605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Enhance: simplify `x=x` --> `x IS NOT NULL OR NULL` [datafusion]

2025-04-12 Thread via GitHub
alamb commented on PR #15589: URL: https://github.com/apache/datafusion/pull/15589#issuecomment-2798807271 Thanks again @ding-young 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Trivial WHERE filter not eliminated when combined with CTE [datafusion]

2025-04-12 Thread via GitHub
alamb closed issue #15387: Trivial WHERE filter not eliminated when combined with CTE URL: https://github.com/apache/datafusion/issues/15387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Enhance: simplify `x=x` --> `x IS NOT NULL OR NULL` [datafusion]

2025-04-12 Thread via GitHub
alamb merged PR #15589: URL: https://github.com/apache/datafusion/pull/15589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Reduce number of tokio blocking threads in SortExec spill [datafusion]

2025-04-12 Thread via GitHub
alamb closed issue #15323: Reduce number of tokio blocking threads in SortExec spill URL: https://github.com/apache/datafusion/issues/15323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Reduce number of tokio blocking threads in SortExec spill [datafusion]

2025-04-12 Thread via GitHub
alamb closed issue #15323: Reduce number of tokio blocking threads in SortExec spill URL: https://github.com/apache/datafusion/issues/15323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] fix: normalize window ident [datafusion]

2025-04-12 Thread via GitHub
alamb commented on PR #15639: URL: https://github.com/apache/datafusion/pull/15639#issuecomment-2798807532 Thanks @chenkovsky and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-12 Thread via GitHub
alamb commented on issue #15686: URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2798808874 > Coercing the LHS to Binary is less performant. Unconditionally parsing literals as FixedSizeBytes would be a breaking change. What about coercing the RHS to FixedSizeBinar

Re: [PR] add config parse_hex_as_fixed_size_binary [datafusion]

2025-04-12 Thread via GitHub
alamb commented on PR #15687: URL: https://github.com/apache/datafusion/pull/15687#issuecomment-2798809026 I have an alternate suggestion here: https://github.com/apache/datafusion/issues/15686#issuecomment-2798808874 -- This is an automated message from the Apache Git Service. To respond

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-12 Thread via GitHub
timsaucer commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798806510 I did a little investigating, but I don't have time for a couple of days to dive in deeper. This appears to be related to https://github.com/apache/datafusion/pull/15542 @UBar

Re: [PR] Add Table Functions to FFI Crate [datafusion]

2025-04-12 Thread via GitHub
timsaucer merged PR #15581: URL: https://github.com/apache/datafusion/pull/15581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-12 Thread via GitHub
alamb merged PR #15654: URL: https://github.com/apache/datafusion/pull/15654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-12 Thread via GitHub
alamb commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798810667 > I did a little investigating, but I don't have time for a couple of days to dive in deeper. This appears to be related to [#15542](https://github.com/apache/datafusion/pull/1554

Re: [PR] Add Table Functions to FFI Crate [datafusion]

2025-04-12 Thread via GitHub
alamb commented on PR #15581: URL: https://github.com/apache/datafusion/pull/15581#issuecomment-2798809598 Very exciting -- thanks @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-12 Thread via GitHub
alamb commented on PR #15654: URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2798807598 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] chore(deps): bump sysinfo from 0.33.1 to 0.34.2 [datafusion]

2025-04-12 Thread via GitHub
alamb merged PR #15682: URL: https://github.com/apache/datafusion/pull/15682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] first_value and last_value should have identical signatures [datafusion]

2025-04-12 Thread via GitHub
timsaucer opened a new issue, #12376: URL: https://github.com/apache/datafusion/issues/12376 ### Describe the bug Less of a bug per se, but it would be nice to have identical function signatures between first_value and last_value ### To Reproduce _No response_ ###

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-12 Thread via GitHub
acking-you commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2798742956 > @acking-you the code needs to be extended to support nulls (you can take a look at the true_count implementation in arrow-rs to do this efficiently). I have an idea f

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-12 Thread via GitHub
timsaucer commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798821873 > > I did a little investigating, but I don't have time for a couple of days to dive in deeper. This appears to be related to [#15542](https://github.com/apache/datafusion/pul

Re: [I] executor can't read s3 config in push-staged mode. [datafusion-ballista]

2025-04-12 Thread via GitHub
milenkovicm closed issue #1235: executor can't read s3 config in push-staged mode. URL: https://github.com/apache/datafusion-ballista/issues/1235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: executor can't read s3 config in push-staged mode [datafusion-ballista]

2025-04-12 Thread via GitHub
milenkovicm merged PR #1236: URL: https://github.com/apache/datafusion-ballista/pull/1236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] Add support for `INHERITS` option in `CREATE TABLE` statement [datafusion-sqlparser-rs]

2025-04-12 Thread via GitHub
LucaCappelletti94 commented on code in PR #1806: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1806#discussion_r2040654851 ## src/parser/mod.rs: ## @@ -7070,13 +7071,22 @@ impl<'a> Parser<'a> { } } -/// Parse configuration like partitioning, cl

Re: [PR] Add support for `INHERITS` option in `CREATE TABLE` statement [datafusion-sqlparser-rs]

2025-04-12 Thread via GitHub
LucaCappelletti94 commented on code in PR #1806: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1806#discussion_r2040655023 ## tests/sqlparser_postgres.rs: ## @@ -2733,6 +2733,39 @@ fn parse_create_brin() { } } +#[test] +fn parse_create_table_with_inherits(

[PR] chore: Prepare for datafusion 47.0.0 + arrow-rs 55.0.0 [datafusion-comet]

2025-04-12 Thread via GitHub
andygrove opened a new pull request, #1642: URL: https://github.com/apache/datafusion-comet/pull/1642 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Set DataFusion runtime configurations through SQL interface [datafusion]

2025-04-12 Thread via GitHub
kumarlokesh commented on code in PR #15594: URL: https://github.com/apache/datafusion/pull/15594#discussion_r2040707194 ## datafusion/core/tests/sql/runtime_config.rs: ## @@ -0,0 +1,73 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [I] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-12 Thread via GitHub
Dandandan commented on issue #15375: URL: https://github.com/apache/datafusion/issues/15375#issuecomment-2798960792 @2010YOUY01 that sound like a very promising future direction. I might try something experimenting on this soon if none beats me to it. -- This is an automated message from

[PR] build(deps): bump mimalloc from 0.1.43 to 0.1.46 [datafusion-python]

2025-04-12 Thread via GitHub
dependabot[bot] opened a new pull request, #1105: URL: https://github.com/apache/datafusion-python/pull/1105 Bumps [mimalloc](https://github.com/purpleprotocol/mimalloc_rust) from 0.1.43 to 0.1.46. Release notes Sourced from https://github.com/purpleprotocol/mimalloc_rust/releases"

Re: [PR] build(deps): bump mimalloc from 0.1.43 to 0.1.45 [datafusion-python]

2025-04-12 Thread via GitHub
dependabot[bot] closed pull request #1095: build(deps): bump mimalloc from 0.1.43 to 0.1.45 URL: https://github.com/apache/datafusion-python/pull/1095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] build(deps): bump mimalloc from 0.1.43 to 0.1.45 [datafusion-python]

2025-04-12 Thread via GitHub
dependabot[bot] commented on PR #1095: URL: https://github.com/apache/datafusion-python/pull/1095#issuecomment-2798993546 Superseded by #1105. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-04-12 Thread via GitHub
acking-you commented on PR #15694: URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2799010340 The error in `cargo test` is caused by an incorrect calculation of the pre-selection. The correct steps for calculating the pre-selection are as follows: 1. Compute the boo

Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-12 Thread via GitHub
leoyvens commented on issue #15686: URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2799013192 @alamb thank you for looking at this. Avoiding a config flag would be nice. But I'm skeptical of the proposed coercion. If we coerce `binary` to `fixed(N)` when encounter

Re: [PR] Add support for `INHERITS` option in `CREATE TABLE` statement [datafusion-sqlparser-rs]

2025-04-12 Thread via GitHub
iffyio merged PR #1806: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Fix internal error in sort when hitting memory limit [datafusion]

2025-04-12 Thread via GitHub
DerGut commented on code in PR #15692: URL: https://github.com/apache/datafusion/pull/15692#discussion_r2040698187 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -529,6 +523,12 @@ impl ExternalSorter { /// Sorts the in-memory batches and merges them into a single sort

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-04-12 Thread via GitHub
duongcongtoai commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2798943345 I think we can break down this story into multiple step: 1. unify the optimizor for correlated query, regardless the query type (exists query, scalar query etc) 2. su

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-12 Thread via GitHub
Dandandan commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798955035 Thanks for sharing the results @zhuqi-lucas this is really interesting! I think it mainly shows that we probably should try and use more efficient in memory sorting (e.g. an

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-12 Thread via GitHub
iffyio commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2040586609 ## src/ast/mod.rs: ## @@ -8368,6 +8387,22 @@ pub enum CreateFunctionBody { /// /// [BigQuery]: https://cloud.google.com/bigquery/docs/referen

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-12 Thread via GitHub
2010YOUY01 commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2798782989 I didn't implement the parallel merge optimization for now, my major concern is: this optimization requires one extra configuration, and users have to learn and correctly set 2 co

Re: [PR] Cascaded spill merge and re-spill [datafusion]

2025-04-12 Thread via GitHub
qstommyshu commented on code in PR #15610: URL: https://github.com/apache/datafusion/pull/15610#discussion_r2040791232 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -615,6 +616,104 @@ async fn test_disk_spill_limit_not_reached() -> Result<()> { Ok(()) } +// Test c

Re: [PR] Cascaded spill merge and re-spill [datafusion]

2025-04-12 Thread via GitHub
qstommyshu commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2799447585 > ### Intended optimization > If the memory pool is enough to hold more batches at a time (while `spill_max_spill_merge_degree` is still limited to 4, in case the merge-degree

Re: [PR] Cascaded spill merge and re-spill [datafusion]

2025-04-12 Thread via GitHub
qstommyshu commented on code in PR #15610: URL: https://github.com/apache/datafusion/pull/15610#discussion_r2040846556 ## docs/source/user-guide/configs.md: ## @@ -84,6 +84,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus | datafusion.execu

Re: [PR] Use pager and allow configuration via `\pset` [datafusion]

2025-04-12 Thread via GitHub
djellemah commented on PR #15597: URL: https://github.com/apache/datafusion/pull/15597#issuecomment-2799539091 Seems to me there's a confluence of several factors here: - testing this kind of functionality is not simple. - it's user facing, so if it breaks somebody will notice a

[PR] Minor: add order by arg for last value [datafusion]

2025-04-12 Thread via GitHub
jayzhan211 opened a new pull request, #15695: URL: https://github.com/apache/datafusion/pull/15695 ## Which issue does this PR close? - Closes #12376. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-12 Thread via GitHub
jayzhan211 commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2799545774 > My only remaining question is if we want to upgrade arrow in this release as well +1 for upgrading all the dependencies -- This is an automated message from the Ap

Re: [PR] PoC (Perf: Support automatically concat_batches for sort which will improve performance) [datafusion]

2025-04-12 Thread via GitHub
Dandandan commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798712702 I think this is already looking quite nice. What do you need to finalize this @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-12 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2040595052 ## src/parser/mod.rs: ## @@ -617,6 +623,9 @@ impl<'a> Parser<'a> { } // `COMMENT` is snowflake specific https://docs.

Re: [I] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-12 Thread via GitHub
Dandandan commented on issue #15375: URL: https://github.com/apache/datafusion/issues/15375#issuecomment-2798726588 Really nice observation! I think we should drive this further. Some further observations I saw when looking at the current implementation on master for the in memory mer

[PR] Fix internal error in sort when hitting memory limit [datafusion]

2025-04-12 Thread via GitHub
DerGut opened a new pull request, #15692: URL: https://github.com/apache/datafusion/pull/15692 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/15675 ## Rationale for this change I noticed an internal error in the sort implementation.

Re: [PR] PoC (Perf: Support automatically concat_batches for sort which will improve performance) [datafusion]

2025-04-12 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798857034 > I think this is already looking quite nice. What do you need to finalize this @zhuqi-lucas Thank you @Dandandan for review, i think we just need to add the benchmark res

[PR] fix: unparse join without projection [datafusion]

2025-04-12 Thread via GitHub
chenkovsky opened a new pull request, #15693: URL: https://github.com/apache/datafusion/pull/15693 ## Which issue does this PR close? - Closes #15688. ## Rationale for this change There are two issues. 1. projection in scan table is dropped in try_transform_to_sim

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-12 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2040673419 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -645,7 +645,36 @@ impl ExternalSorter { return self.sort_batch_stream(batch, metrics, reserva

Re: [I] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-12 Thread via GitHub
zhuqi-lucas commented on issue #15375: URL: https://github.com/apache/datafusion/issues/15375#issuecomment-2798866385 Thank you @Dandandan , addressed your comments. And we can make it as the first version. And in future we may can improve it as described by @2010YOUY01 : https://github.c

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-12 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798868399 @alamb Do we have the CI benchmark running now? If no, i need your help to run... Thanks a lot! And also for the sort-tpch itself, i was running for the improvement result,

Re: [PR] fix: Modify Spark SQL core 2 tests for `native_datafusion` reader, change 3.5.5 diff hash length to 11 [datafusion-comet]

2025-04-12 Thread via GitHub
andygrove merged PR #1641: URL: https://github.com/apache/datafusion-comet/pull/1641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Standardize Spark diff hash length [datafusion-comet]

2025-04-12 Thread via GitHub
andygrove closed issue #1640: Standardize Spark diff hash length URL: https://github.com/apache/datafusion-comet/issues/1640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-04-12 Thread via GitHub
acking-you opened a new pull request, #15694: URL: https://github.com/apache/datafusion/pull/15694 ## Which issue does this PR close? - Closes #15636 ## Rationale for this change Many thanks to @kosiew for doing a tremendous amount of work. Based on his P

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-12 Thread via GitHub
acking-you commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2798871646 > > [@acking-you](https://github.com/acking-you) the code needs to be extended to support nulls (you can take a look at the true_count implementation in arrow-rs to do this e

Re: [PR] Flatten array in a single step instead of recursive [datafusion]

2025-04-12 Thread via GitHub
delamarch3 commented on PR #15160: URL: https://github.com/apache/datafusion/pull/15160#issuecomment-2798766441 Thanks for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Make Clickbench Q29 5x faster for datafusion [datafusion]

2025-04-12 Thread via GitHub
ctsk commented on PR #15532: URL: https://github.com/apache/datafusion/pull/15532#issuecomment-2798754505 Heads up: `SUM(DISTINCT (x + 5))` is **not** equivalent to `SUM(DISTINCT x) + 5 * COUNT(DISTINCT x)` -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-12 Thread via GitHub
alamb commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798878082 > > Do we want to hold DF 47 release for the arrow upgrade too? > > I think it is possible (arrow will hopefully be released at the end of this week -- and we could make the DF

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-12 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798879716 Latest result based current latest code: ```rust Benchmark sort_tpch1.json ┏━━┳━━┳━

Re: [I] Document `CREATE EXTERNAL TABLE ... OPTIONS` [datafusion]

2025-04-12 Thread via GitHub
alamb commented on issue #10451: URL: https://github.com/apache/datafusion/issues/10451#issuecomment-2798883874 Hi @marvelshan -- I also took a look at the docs. It actually looks like the format options are largely documented, but the documentation could be improved I suggest:

Re: [PR] Fix internal error in sort when hitting memory limit [datafusion]

2025-04-12 Thread via GitHub
2010YOUY01 commented on code in PR #15692: URL: https://github.com/apache/datafusion/pull/15692#discussion_r2040878250 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -529,6 +523,12 @@ impl ExternalSorter { /// Sorts the in-memory batches and merges them into a single

[PR] feat: Emit warning with Diagnostic when doing = Null [datafusion]

2025-04-12 Thread via GitHub
changsun20 opened a new pull request, #15696: URL: https://github.com/apache/datafusion/pull/15696 ## Which issue does this PR close? Closes #14434 ## Rationale for this change This PR addresses a common SQL anti-pattern where users accidentally use `= NULL` instead of `

Re: [PR] feat: Emit warning with Diagnostic when doing = Null [datafusion]

2025-04-12 Thread via GitHub
changsun20 commented on PR #15696: URL: https://github.com/apache/datafusion/pull/15696#issuecomment-2799781696 Hi @eliaperantoni, Thank you for your patience and guidance throughout this issue. I've implemented the core functionality per our discussions, but would like to confirm a

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-12 Thread via GitHub
andygrove commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2799561705 I am also +1 for upgrading the dependencies (for selfish reasons; we are waiting on an arrow feature to help with INT96 timestamps in Parquet) -- This is an automated messag