Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
zhuqi-lucas commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687169996 Thanks @2010YOUY01 @SemyonSinchenko for review , I tried again, it's not a problem for me now, and previously may due to my disk is not enough, i cleaned up some disk usage.

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
2010YOUY01 commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687165121 Thank you for the benchmark, I've tested it locally and it's working well. I have several small suggestions: 1. Add document for this new join benchmark https://github.com/apac

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
SemyonSinchenko commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687145015 @zhuqi-lucas Did you try to increase the `batch_size` argument? It is designed to avoid OOMs but the small batch size can also reduce the generation speed. If your computer h

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
2010YOUY01 commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687143242 > > The data generation will take long time for big data. > > How bad is it? I can try to dig into the problem and try to improve it on the side of `falsa` (generation libra

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r1973021692 ## tests/sqlparser_mssql.rs: ## @@ -1590,6 +1590,30 @@ fn parse_create_table_with_valid_options() { comment: None,

[PR] Fix the null handling for to_char function [datafusion]

2025-02-26 Thread via GitHub
kosiew opened a new pull request, #14908: URL: https://github.com/apache/datafusion/pull/14908 ## Which issue does this PR close? - Closes #14884. ## Rationale for this change Currently, when passing a NULL value to to_char, it returns an empty string instead

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2687033429 > If something built and rewrite by analyzer before, it will now fail If you have count wildcard and is rewritten by analyzer, it fails because we remove the count wild

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-26 Thread via GitHub
shehabgamin commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2687025000 > ``` > query I > SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col); > > 4 > ``` > > Is there other query reproducible in main branch? T

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
iffyio commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r1972915911 ## src/ast/dml.rs: ## @@ -138,6 +143,30 @@ pub struct CreateTable { pub engine: Option, pub comment: Option, pub auto_increment_offset: O

Re: [PR] Parse ALTER TABLE AUTO_INCREMENT operation for MySQL [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
iffyio merged PR #1748: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Random test cleanups use Expr::value [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
iffyio merged PR #1749: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2686936016 ``` query I SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col); 4 ``` Is there other query reproducible in main branch? The query given d

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
zhuqi-lucas commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2686901486 Thanks @SemyonSinchenko for review, not too bad for my case, and it takes more time and it's expected for huge file generation. But my computer is 48GB memory, i assume lower mem

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-26 Thread via GitHub
shehabgamin commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2686821819 @alamb @jayzhan211 @xudong963 Unfortunately https://github.com/apache/datafusion/pull/14824 did not fix the wildcard issue. I'm currently working on Sail's upcoming release,

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on PR #14637: URL: https://github.com/apache/datafusion/pull/14637#issuecomment-2686820953 Separate refactoring PR: https://github.com/apache/datafusion/pull/14907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for starts_with [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on PR #14812: URL: https://github.com/apache/datafusion/pull/14812#issuecomment-2686817387 Thanks @zjregee -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for starts_with [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 merged PR #14812: URL: https://github.com/apache/datafusion/pull/14812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Simple Functions Preview [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on PR #14668: URL: https://github.com/apache/datafusion/pull/14668#issuecomment-2686798534 > Implementing a function that should be called also for null input requires declaring it, by wrapping relevant arguments in Option<...> If we can handle `Option` as `null`

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-26 Thread via GitHub
shehabgamin commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2686793279 @andygrove Just checking in. I've been adding tons of new functions to the Sail repo but will port them over after this PR is merged! -- This is an automated message from the A

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]

2025-02-26 Thread via GitHub
zjregee commented on code in PR #14865: URL: https://github.com/apache/datafusion/pull/14865#discussion_r1972787989 ## datafusion/functions/src/string/btrim.rs: ## @@ -19,20 +19,28 @@ use crate::string::common::*; use crate::utils::{make_scalar_function, utf8_to_str_type}; use

Re: [PR] Allow setting the recursion limit for sql parsing [datafusion]

2025-02-26 Thread via GitHub
cetra3 commented on PR #14756: URL: https://github.com/apache/datafusion/pull/14756#issuecomment-2686743916 @alamb I've added a test and a builder struct, let me know if you want further changes -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] chore(deps): bump flate2 from 1.0.35 to 1.1.0 [datafusion]

2025-02-26 Thread via GitHub
jonahgao merged PR #14848: URL: https://github.com/apache/datafusion/pull/14848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Allow setting the recursion limit for sql parsing [datafusion]

2025-02-26 Thread via GitHub
cetra3 commented on code in PR #14756: URL: https://github.com/apache/datafusion/pull/14756#discussion_r1972732904 ## datafusion/core/src/execution/session_state.rs: ## @@ -483,12 +483,21 @@ impl SessionState { MsSQL, ClickHouse, BigQuery, Ansi."

Re: [PR] feat: scalar regex match physical expr [datafusion]

2025-02-26 Thread via GitHub
github-actions[bot] closed pull request #12270: feat: scalar regex match physical expr URL: https://github.com/apache/datafusion/pull/12270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] 【TPCH】Comet do not show performance advantages over native Spark? [datafusion-comet]

2025-02-26 Thread via GitHub
xingnailu commented on issue #1450: URL: https://github.com/apache/datafusion-comet/issues/1450#issuecomment-2686602166 > Hi [@xingnailu](https://github.com/xingnailu) here are some initial questions: > > * Which version of Comet are you using? > * Do you see queries running nativ

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1972648463 ## native/core/src/execution/planner.rs: ## @@ -922,13 +956,18 @@ impl PhysicalPlanner { Ok(DataType::Decimal128(_p2, _s2)),

Re: [I] feat: Support read array type using native reader [datafusion-comet]

2025-02-26 Thread via GitHub
comphead commented on issue #1454: URL: https://github.com/apache/datafusion-comet/issues/1454#issuecomment-2686543353 ``` Cause: org.apache.comet.CometNativeException: Cannot cast file schema field simple_array of type List(Field { name: "element", data_type: Int32, nullable: true, di

Re: [PR] Prepare for 46.0.0 release: Version and Changelog [datafusion]

2025-02-26 Thread via GitHub
xudong963 commented on PR #14903: URL: https://github.com/apache/datafusion/pull/14903#issuecomment-2686532090 > Thank you @xudong963 > > I recommend the following for next steps (which is what I did for 45) > > 1. Wait to merge this PR until we have all the needed content (e.g

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1972637274 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2641,4 +2641,57 @@ class CometExpressionSuite extends CometTestBase with

Re: [I] Docker image for 0.6.0 failed to build [datafusion-comet]

2025-02-26 Thread via GitHub
comphead commented on issue #1417: URL: https://github.com/apache/datafusion-comet/issues/1417#issuecomment-2686500156 Fixed by #1421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[I] feat: Support read array type using native reader [datafusion-comet]

2025-02-26 Thread via GitHub
comphead opened a new issue, #1454: URL: https://github.com/apache/datafusion-comet/issues/1454 ### What is the problem the feature request solves? Currently array type not supported when reading complex types with native reader Reading a data with schema ``` root |--

Re: [I] 【TPCH】Comet do not show performance advantages over native Spark? [datafusion-comet]

2025-02-26 Thread via GitHub
xingnailu commented on issue #1450: URL: https://github.com/apache/datafusion-comet/issues/1450#issuecomment-2686513062 ![Image](https://github.com/user-attachments/assets/d8772e73-7b7f-45e6-ba46-294a051eb1bc) -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] 【TPCH】Comet do not show performance advantages over native Spark? [datafusion-comet]

2025-02-26 Thread via GitHub
xingnailu commented on issue #1450: URL: https://github.com/apache/datafusion-comet/issues/1450#issuecomment-2686513349 ![Image](https://github.com/user-attachments/assets/9c559176-a087-474e-b7ac-0dc45f1344a1) -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Binary string (`BYTEA`, `Binary`) concatenation [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on issue #12709: URL: https://github.com/apache/datafusion/issues/12709#issuecomment-2686512348 ArrayConcat use signature `variadic_any` which should be changed to something in ArraySignature -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Change in 46: `count_all()` expr_fn function now displayed as `count(1)` rather than `count(*)` [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on issue #14894: URL: https://github.com/apache/datafusion/issues/14894#issuecomment-2686510226 I think this is quite different from #14895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] refactor [datafusion]

2025-02-26 Thread via GitHub
wiedld opened a new pull request, #14907: URL: https://github.com/apache/datafusion/pull/14907 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [I] Docker image for 0.6.0 failed to build [datafusion-comet]

2025-02-26 Thread via GitHub
comphead closed issue #1417: Docker image for 0.6.0 failed to build URL: https://github.com/apache/datafusion-comet/issues/1417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-02-26 Thread via GitHub
vbarua commented on code in PR #13511: URL: https://github.com/apache/datafusion/pull/13511#discussion_r1972603943 ## datafusion/core/src/execution/session_state.rs: ## @@ -146,6 +146,10 @@ pub struct SessionState { scalar_functions: HashMap>, /// Aggregate functions r

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-02-26 Thread via GitHub
vbarua commented on code in PR #13511: URL: https://github.com/apache/datafusion/pull/13511#discussion_r1972589737 ## datafusion/core/src/physical_planner.rs: ## @@ -1619,13 +1620,20 @@ pub fn create_aggregate_expr_with_name_and_maybe_filter( == NullTreatment::I

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2686475105 The equivalence properties' ordering methods, as well as the `add_sort_above` util method, were checking if the field was a constant -- and not considering if it's a heterogeneous con

Re: [I] Parse MySQL `ALGORITHM` and `LOCK` options to `ALTER TABLE` [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
mvzink commented on issue #1665: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1665#issuecomment-2686451364 Oops, sorry, that didn't fix `LOCK`! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] Parse MySQL `ALGORITHM` and `LOCK` options to `ALTER TABLE` [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
mvzink opened a new issue, #1665: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1665 Example: ```sql ALTER TABLE tbl_name MODIFY COLUMN col_name column_definition FIRST, ALGORITHM=INPLACE, LOCK=NONE; ``` See [`ALTER TABLE` docs](https://dev.mysql.com/doc/

Re: [I] Parse MySQL `ALGORITHM` and `LOCK` options to `ALTER TABLE` [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
mvzink commented on issue #1665: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1665#issuecomment-2686449170 Forgot to mark this closed by #1745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Parse MySQL `ALGORITHM` and `LOCK` options to `ALTER TABLE` [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
mvzink closed issue #1665: Parse MySQL `ALGORITHM` and `LOCK` options to `ALTER TABLE` URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
mvzink commented on PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#issuecomment-2686447231 Ah, I'm guessing this supersedes #1746. As mentioned in my [comment](https://github.com/apache/datafusion-sqlparser-rs/pull/1746#issuecomment-2686444775) there, I strongly

Re: [PR] Align SQL formatting and add all missing table options [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
mvzink commented on PR #1746: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1746#issuecomment-2686444775 If you are considering overhauling and fleshing out support for table options, I would be strongly in favor of making them work like column options: i.e. a `Vec` of option

Re: [PR] Update regenerate sql dep, revert runner changes. [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14901: URL: https://github.com/apache/datafusion/pull/14901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Update regenerate sql dep, revert runner changes. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14901: URL: https://github.com/apache/datafusion/pull/14901#issuecomment-2686439154 Thanks again @Omega359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Update regenerate sql dep, revert runner changes. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14901: URL: https://github.com/apache/datafusion/pull/14901#issuecomment-2686438985 I just tested this locally and it compiled and ran (and consumed a bunch of CPU) just great 👍 I also filed a follow on ticket to make this process better: - https://github.co

[I] Improve regeneration of sqlite expected test suite [datafusion]

2025-02-26 Thread via GitHub
alamb opened a new issue, #14906: URL: https://github.com/apache/datafusion/issues/14906 ### Is your feature request related to a problem or challenge? Thanks to some great work from @Omega359 as part of each commit to main DataFusion runs many thousand queries from the sqlite test su

Re: [I] Binary string (`BYTEA`, `Binary`) concatenation [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #12709: URL: https://github.com/apache/datafusion/issues/12709#issuecomment-2686431136 I agree this still seems not to be supported ```sql > select arrow_typeof(x'1234'); +---+ | arrow_typeof(Binary("18,52")) | +-

Re: [I] Improve efficiency of CI checks (so we can add MORE!) [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #13845: URL: https://github.com/apache/datafusion/issues/13845#issuecomment-2686429288 I am going to claim that we have improved the CI efficiency so I am going to close this. I am sure we'll come up with some need in the future to optimize it again but for now it i

Re: [I] minor: `SessionStateBuilder::with_default_features` ergonomics [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14899: URL: https://github.com/apache/datafusion/issues/14899#issuecomment-2686427474 Seems like a good first issue to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] minor: `SessionStateBuilder::with_default_features` ergonomics [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14899: URL: https://github.com/apache/datafusion/issues/14899#issuecomment-2686427138 I agree it would be great not to override any existing table providers :thy -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972566115 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -572,15 +567,12 @@ impl ShuffleRepartitioner { output_data.write_all(&output_batc

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972471189 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1076,6 +1053,146 @@ mod test { shuffle_write_test(1, 100, 200, Some(10 * 1024 *

Re: [PR] fix: metrics tests for native_datafusion experimental native scan [datafusion-comet]

2025-02-26 Thread via GitHub
andygrove merged PR #1445: URL: https://github.com/apache/datafusion-comet/pull/1445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
andygrove commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972545720 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -572,15 +567,12 @@ impl ShuffleRepartitioner { output_data.write_all(&output_batch

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
andygrove commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972553012 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1076,6 +1053,146 @@ mod test { shuffle_write_test(1, 100, 200, Some(10 * 1024 * 1

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
andygrove commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972551155 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -572,15 +567,12 @@ impl ShuffleRepartitioner { output_data.write_all(&output_batch

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2686396698 > I think `count()` is constant in this case because there is only one partition perhaps 🤔 > > I think the SQL looks like > > ```sql > SELECT count() over () > ```

Re: [PR] Prepare for 55.0.0 release: Version and CHANGELOG [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
mvzink commented on code in PR #1750: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1750#discussion_r1972543249 ## changelog/0.55.0.md: ## @@ -0,0 +1,122 @@ + + +# sqlparser-rs 0.55.0 Changelog + +This release consists of 56 commits from 25 contributors. See credi

Re: [PR] Require `Debug` for `DataSource` [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14882: URL: https://github.com/apache/datafusion/pull/14882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14888: URL: https://github.com/apache/datafusion/pull/14888#issuecomment-2686367462 Thanks again @joroKr21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14888: URL: https://github.com/apache/datafusion/pull/14888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2686360403 I think `count()` is constant in this case because there is only one partition perhaps 🤔 I think the SQL looks like ```sql SELECT count() over () ``` But maybe n

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2686336677 > `WindowAggExec` would produce a sort output that is `nullable_col` and `constants={count}` > > So then I would expect that the sort properties should be satisfied and this pa

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2686328283 ``` "SortExec: expr=[non_nullable_col@1 ASC NULLS LAST, count@2 ASC NULLS LAST], preserve_partitioning=[false]", " WindowAggExec: wdw=[count: Ok(Field { name: \"count\", data

Re: [I] Fix the null handling for `to_char` function [datafusion]

2025-02-26 Thread via GitHub
Omega359 commented on issue #14884: URL: https://github.com/apache/datafusion/issues/14884#issuecomment-2686313231 I went looking for a system where null input would return an empty string an came up blank. I am fairly certain spark would return null, and the equiv function in duckdb return

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972471189 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1076,6 +1053,146 @@ mod test { shuffle_write_test(1, 100, 200, Some(10 * 1024 *

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2686306852 > There is a new test on main ([added 4 hours ago](https://github.com/apache/datafusion/blob/f5b7affecd90e9be26289d869c4a542359cb98e3/datafusion/core/tests/physical_optimizer/enforce_s

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972471189 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1076,6 +1053,146 @@ mod test { shuffle_write_test(1, 100, 200, Some(10 * 1024 *

Re: [I] Release sqlparser-rs version `0.55.0` [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
alamb commented on issue #1671: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1671#issuecomment-2686287400 I have created a readme / version bump: - https://github.com/apache/datafusion-sqlparser-rs/pull/1750 I realistically don't have time to try and test this out

Re: [PR] Prepare for 55.0.0 release: Version and CHANGELOG [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
alamb commented on PR #1750: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1750#issuecomment-2686284394 FYI @iffyio -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972471189 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1076,6 +1053,146 @@ mod test { shuffle_write_test(1, 100, 200, Some(10 * 1024 *

Re: [PR] Prepare for 55.0.0 release: Version and CHANGELOG [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
alamb commented on code in PR #1750: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1750#discussion_r1972470040 ## changelog/0.55.0.md: ## @@ -0,0 +1,122 @@ + + +# sqlparser-rs 0.55.0 Changelog + +This release consists of 56 commits from 25 contributors. See credit

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972468486 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1076,6 +1053,146 @@ mod test { shuffle_write_test(1, 100, 200, Some(10 * 1024 *

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#issuecomment-2686263360 Thanks for tackling this, @andygrove! Great to see shuffle write improving. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972454276 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1076,6 +1053,146 @@ mod test { shuffle_write_test(1, 100, 200, Some(10 * 1024 *

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972447242 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -572,15 +567,12 @@ impl ShuffleRepartitioner { output_data.write_all(&output_batc

Re: [PR] perf: Reduce native shuffle memory overhead by 50% [datafusion-comet]

2025-02-26 Thread via GitHub
codecov-commenter commented on PR #1452: URL: https://github.com/apache/datafusion-comet/pull/1452#issuecomment-2686253846 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1452?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-26 Thread via GitHub
mbutrovich commented on code in PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#discussion_r1972447242 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -572,15 +567,12 @@ impl ShuffleRepartitioner { output_data.write_all(&output_batc

[PR] Random test cleanups use Expr::value [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
alamb opened a new pull request, #1749: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1749 Random cleanup after - https://github.com/apache/datafusion-sqlparser-rs/pull/1738 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Release sqlparser-rs version `0.55.0` [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
alamb commented on issue #1671: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1671#issuecomment-2686226996 I am starting to implement this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Store spans for Value expressions [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
alamb commented on code in PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#discussion_r1972433284 ## src/ast/mod.rs: ## @@ -8789,9 +8796,9 @@ mod tests { #[test] fn test_interval_display() { let interval = Expr::Interval(Interval {

Re: [PR] Prepare for 46.0.0 release: Version and Changelog [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14903: URL: https://github.com/apache/datafusion/pull/14903#issuecomment-2686209392 Thank you @xudong963 I recommend the following for next steps (which is what I did for 45) 1. Wait to merge this PR until we have all the needed content (e.g we have fixed wh

Re: [PR] Require `Debug` for `DataSource` [datafusion]

2025-02-26 Thread via GitHub
alamb commented on code in PR #14882: URL: https://github.com/apache/datafusion/pull/14882#discussion_r1972413284 ## datafusion/datasource/src/source.rs: ## @@ -35,7 +35,7 @@ use datafusion_physical_expr_common::sort_expr::LexOrdering; /// Common behaviors in Data Sources for

[I] Optimize native shuffle for single partition case [datafusion-comet]

2025-02-26 Thread via GitHub
andygrove opened a new issue, #1453: URL: https://github.com/apache/datafusion-comet/issues/1453 ### What is the problem the feature request solves? When native shuffle has a single output partition, we append the rows from the input batch into array builders and then create a new bat

Re: [PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-26 Thread via GitHub
alamb commented on code in PR #14888: URL: https://github.com/apache/datafusion/pull/14888#discussion_r1972400092 ## datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs: ## @@ -122,14 +123,21 @@ impl SimplifyExpressions { // Preserve expression names to av

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1972395923 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -98,66 +98,91 @@ fn pushdown_sorts_helper( .ordering_satisfy_requirement(&paren

Re: [I] Support columns having the same alias [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #6543: URL: https://github.com/apache/datafusion/issues/6543#issuecomment-2686160994 > what can we do about this ? > can i update the slt test for this test ? Whatever is rewriting the function call probably should add an alias 🤔 -- This is an automated

[I] Error projecting statistics in `DataSourceExec` [datafusion]

2025-02-26 Thread via GitHub
alamb opened a new issue, #14905: URL: https://github.com/apache/datafusion/issues/14905 ### Describe the bug While working on [upgrading Delta.rs ](https://github.com/delta-io/delta-rs/pull/3261)to DataFusion 46 I am getting the following error > index out of bounds: the len

Re: [PR] Include struct name on FileScanConfig debug impl [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14883: URL: https://github.com/apache/datafusion/pull/14883#issuecomment-2686090308 Thank you @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2686089131 Maybe this is the usecase for variant 🤔 - https://github.com/apache/arrow-rs/issues/6736 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] 【TPCH】Comet do not show performance advantages over native Spark? [datafusion-comet]

2025-02-26 Thread via GitHub
andygrove commented on issue #1450: URL: https://github.com/apache/datafusion-comet/issues/1450#issuecomment-2686075824 Hi @xingnailu here are some initial questions: -Which version of Comet are you using? - Do you see queries running natively using Comet operators? - Do you se

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2686065559 > There is a new test on main ([added 4 hours ago](https://github.com/apache/datafusion/blob/f5b7affecd90e9be26289d869c4a542359cb98e3/datafusion/core/tests/physical_optimizer/enforce_so

Re: [I] NoSuchMethodError with Spark 3.5.3 (EMR 7.6) [datafusion-comet]

2025-02-26 Thread via GitHub
andygrove commented on issue #1451: URL: https://github.com/apache/datafusion-comet/issues/1451#issuecomment-2686057451 > I am getting the following error when running a job on EMR 7.6 with spark code also compiled with version 3.5.3 and using comet: 3.5_2.12-0.6.0 Did you compile Co

Re: [PR] Change tpch validation to use `exec_sql_on_tables` [datafusion-ray]

2025-02-26 Thread via GitHub
andygrove commented on code in PR #66: URL: https://github.com/apache/datafusion-ray/pull/66#discussion_r1972273649 ## src/util.rs: ## @@ -397,6 +402,52 @@ fn print_node(plan: &Arc, indent: usize, output: &mut String) } } +async fn exec_sql(query: String, tables: Vec<(S

[PR] Move HashJoin from `RawTable` to `HashTable` [datafusion]

2025-02-26 Thread via GitHub
Dandandan opened a new pull request, #14904: URL: https://github.com/apache/datafusion/pull/14904 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-02-26 Thread via GitHub
ZENOTME commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2685969063 > Thanks for checking it out [@ZENOTME](https://github.com/ZENOTME) > > > JSON bench will have a different schema for row and looks like datafusion(arrow-json) can't suppo

  1   2   3   >