Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-25 Thread via GitHub
findepi commented on code in PR #14844: URL: https://github.com/apache/datafusion/pull/14844#discussion_r1971099360 ## datafusion-cli/tests/cli_integration.rs: ## @@ -39,6 +39,10 @@ fn init() { ["--command", "select 1; select 2;", "--format", "json", "-q"], "[{\"Int64(

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-25 Thread via GitHub
findepi commented on PR #14844: URL: https://github.com/apache/datafusion/pull/14844#issuecomment-2684193983 > I think you are right that this is the core question: "should `datafusion-cli` be doing any escaping / unescaping itself?" no, the CLI should not do thos > If we want

Re: [I] Support User-Defined Sorting [datafusion]

2025-02-25 Thread via GitHub
tobixdev commented on issue #14828: URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2684185236 > In terms of arrow-rs I am not sure we should add anything there yet -- I think we should start the implementation in DataFusion and then port stuff uptream to arrow-rs when i

Re: [I] Support User-Defined Sorting [datafusion]

2025-02-25 Thread via GitHub
tobixdev commented on issue #14828: URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2684185492 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-25 Thread via GitHub
wForget commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1971074231 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2641,4 +2641,20 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-25 Thread via GitHub
berkaysynnada commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1971059284 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -98,66 +98,91 @@ fn pushdown_sorts_helper( .ordering_satisfy_requiremen

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-25 Thread via GitHub
2010YOUY01 commented on PR #14823: URL: https://github.com/apache/datafusion/pull/14823#issuecomment-2684140744 Thank you all for the feedbacks. I have addressed the review comments (also added a small further simplification for the refactor) -- This is an automated message from the Apach

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-25 Thread via GitHub
djanderson commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2684139203 Hey all, sorry for the noise on this thread lately, I've been working to try and understand how to apply this example to an actual DataFusion-based server, and not having much luc

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-25 Thread via GitHub
2010YOUY01 commented on code in PR #14823: URL: https://github.com/apache/datafusion/pull/14823#discussion_r1971059228 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1425,7 +1478,7 @@ mod tests { // Processing 840 KB of data using 400 KB of memory requires at lea

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14865: URL: https://github.com/apache/datafusion/pull/14865#discussion_r1971048076 ## datafusion/functions/src/string/btrim.rs: ## @@ -19,20 +19,28 @@ use crate::string::common::*; use crate::utils::{make_scalar_function, utf8_to_str_type};

[PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-25 Thread via GitHub
joroKr21 opened a new pull request, #14888: URL: https://github.com/apache/datafusion/pull/14888 Whenever we use `recompute_schema` or `with_exprs_and_inputs`, this ensures that we obtain the same schema. ## Which issue does this PR close? Followup to #14734 ## R

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-25 Thread via GitHub
berkaysynnada commented on code in PR #14685: URL: https://github.com/apache/datafusion/pull/14685#discussion_r1971022680 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -345,6 +345,32 @@ impl FileScanConfig { /// Set the projection of the files

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-25 Thread via GitHub
berkaysynnada commented on code in PR #14685: URL: https://github.com/apache/datafusion/pull/14685#discussion_r1971022680 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -345,6 +345,32 @@ impl FileScanConfig { /// Set the projection of the files

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-25 Thread via GitHub
shehabgamin commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2684037703 Is there a reason why some `arrow` crates are using version `54.2.0` while others are using `54.1.0`? ``` arrow = { version = "54.2.0", features = [ "prettyprint

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-25 Thread via GitHub
wForget commented on PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#issuecomment-2684022663 Integral divide of `decimal` type has inconsistent behavior. test: ``` test("test integral divide") { withTable("t1", "t2") { if (isSpark34Plus

Re: [PR] Parse MySQL ALTER TABLE ALGORITHM option [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
iffyio merged PR #1745: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Parse SIGNED INTEGER type in MySQL CAST [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
iffyio merged PR #1739: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] MySQL datatype discrepancy between DDL and `CAST` [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
iffyio closed issue #1589: MySQL datatype discrepancy between DDL and `CAST` URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
iffyio commented on code in PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#discussion_r1970924952 ## src/parser/mod.rs: ## @@ -7629,24 +7643,34 @@ impl<'a> Parser<'a> { } pub fn parse_index_type(&mut self) -> Result { -if self.par

Re: [I] Remove the need for registering an ObjectStore for remote files [datafusion-python]

2025-02-25 Thread via GitHub
kylebarron commented on issue #899: URL: https://github.com/apache/datafusion-python/issues/899#issuecomment-2683964010 I'd suggest to wait until the `object_store` 0.12 release (and, then, for datafusion to use that) (because I'm pinned to latest main of `object_store` from pyo3_object-st

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-25 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1970921927 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-25 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1970921927 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-25 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1943824052 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Implement builder style API for ParserOptions [datafusion]

2025-02-25 Thread via GitHub
kosiew commented on code in PR #14887: URL: https://github.com/apache/datafusion/pull/14887#discussion_r1970909759 ## datafusion/sql/src/planner.rs: ## @@ -43,15 +43,30 @@ pub use datafusion_expr::planner::ContextProvider; /// SQL parser options #[derive(Debug, Clone, Copy)]

[PR] Implement builder style API for ParserOptions [datafusion]

2025-02-25 Thread via GitHub
kosiew opened a new pull request, #14887: URL: https://github.com/apache/datafusion/pull/14887 ## Which issue does this PR close? - Closes #14879. ## Rationale for this change Currently, adding new fields to ParserOptions requires a downstream API change,

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas commented on code in PR #14877: URL: https://github.com/apache/datafusion/pull/14877#discussion_r1970883672 ## datafusion-cli/src/print_format.rs: ## @@ -209,6 +211,145 @@ impl PrintFormat { } Ok(()) } + +#[allow(clippy::too_many_arguments)

[I] Native shuffle inaccurate estimate of builder memory allocation [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove opened a new issue, #1449: URL: https://github.com/apache/datafusion-comet/issues/1449 ### Describe the bug As demonstrated in unit tests added in https://github.com/apache/datafusion-comet/pull/1440, we are allocating ~100kb for a batch when the actual memory used in less

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-02-25 Thread via GitHub
xudong963 commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2683864272 > FWIW I think [@xudong963](https://github.com/xudong963) said he has experience implementing such code so perhaps he will be able to help / assist with the implementation and

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas commented on code in PR #14877: URL: https://github.com/apache/datafusion/pull/14877#discussion_r1970885830 ## datafusion-cli/src/print_format.rs: ## @@ -209,6 +211,145 @@ impl PrintFormat { } Ok(()) } + +#[allow(clippy::too_many_arguments)

[I] Native shuffle double allocates memory [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove opened a new issue, #1448: URL: https://github.com/apache/datafusion-comet/issues/1448 ### Describe the bug As demonstrated in the new unit tests added in https://github.com/apache/datafusion-comet/pull/1440, native shuffle is double allocating memory. ```rust

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas commented on PR #14877: URL: https://github.com/apache/datafusion/pull/14877#issuecomment-2683862205 > Thank you @zhuqi-lucas -- this is really cool! > > I have a suggestion that I think would make the code better, but we could do it as a follow on PR or never in my opinio

[I] Code clean for new datafusion-cli streaming logic [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas opened a new issue, #14886: URL: https://github.com/apache/datafusion/issues/14886 ### Is your feature request related to a problem or challenge? This is a follow-up for: https://github.com/apache/datafusion/pull/14877#discussion_r1970328077 And we can do some cod

[PR] Implement builder style API for ParserOptions [datafusion]

2025-02-25 Thread via GitHub
diegoreis42 opened a new pull request, #14885: URL: https://github.com/apache/datafusion/pull/14885 ## Which issue does this PR close? - Closes #14879 ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [PR] chore: faster maven mirror [datafusion-comet]

2025-02-25 Thread via GitHub
codecov-commenter commented on PR #1447: URL: https://github.com/apache/datafusion-comet/pull/1447#issuecomment-2683791579 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1447?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] Update test for datafusion #14824 [datafusion-testing]

2025-02-25 Thread via GitHub
jayzhan211 opened a new pull request, #7: URL: https://github.com/apache/datafusion-testing/pull/7 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[PR] chore: faster maven mirror [datafusion-comet]

2025-02-25 Thread via GitHub
comphead opened a new pull request, #1447: URL: https://github.com/apache/datafusion-comet/pull/1447 ## Which issue does this PR close? Adding faster maven mirror and enabling some caching Closes #. ## Rationale for this change ## What changes are i

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970785236 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres

[I] Fix the null handling for `to_char` function [datafusion]

2025-02-25 Thread via GitHub
goldmedal opened a new issue, #14884: URL: https://github.com/apache/datafusion/issues/14884 ### Describe the bug Currenlty, if we input a null value to `to_char`, we will get an empty string instead of a null value. ``` > select to_char(NULL, '%Y-%m-%d %H:%M:%S') is null; +-

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970798347 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres -

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-25 Thread via GitHub
wForget commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1970788193 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2641,4 +2641,20 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970785236 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970785236 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 merged PR #14881: URL: https://github.com/apache/datafusion/pull/14881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on PR #14881: URL: https://github.com/apache/datafusion/pull/14881#issuecomment-2683674514 Thanks @alamb, let me try it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683368595 I did a quick check in the spark functions PR, I didn't see anything related to fill. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] Also migrate "invoke" to "invoke_with_args" [datafusion]

2025-02-25 Thread via GitHub
niebayes closed issue #14724: Also migrate "invoke" to "invoke_with_args" URL: https://github.com/apache/datafusion/issues/14724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]

2025-02-25 Thread via GitHub
goldmedal closed pull request #14864: Fixed Migrate Datetime functions to invoke_with_args Issue 14705 URL: https://github.com/apache/datafusion/pull/14864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] chore: migrate to `invoke_with_args` for datetime functions [datafusion]

2025-02-25 Thread via GitHub
goldmedal merged PR #14876: URL: https://github.com/apache/datafusion/pull/14876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] chore: migrate to `invoke_with_args` for datetime functions [datafusion]

2025-02-25 Thread via GitHub
goldmedal commented on PR #14876: URL: https://github.com/apache/datafusion/pull/14876#issuecomment-2683639200 Thanks @onlyjackfrost and @alamb for reviewing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Migrate Datetime functions to `invoke_with_args` [datafusion]

2025-02-25 Thread via GitHub
goldmedal closed issue #14705: Migrate Datetime functions to `invoke_with_args` URL: https://github.com/apache/datafusion/issues/14705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] refactor: make SqlToRel::new derive the parser options from the context provider [datafusion]

2025-02-25 Thread via GitHub
niebayes commented on PR #14822: URL: https://github.com/apache/datafusion/pull/14822#issuecomment-2683625364 @alamb Sure, I'll add some tests soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-02-25 Thread via GitHub
goldmedal commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2683612589 I think you mean @douenergy (Alex) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] feat: `tree` / pretty explain [datafusion]

2025-02-25 Thread via GitHub
irenjj commented on code in PR #14677: URL: https://github.com/apache/datafusion/pull/14677#discussion_r1970734646 ## datafusion/physical-plan/src/memory.rs: ## @@ -192,6 +192,7 @@ impl DisplayAs for LazyMemoryExec { .join(", ") )

Re: [PR] chore: Re-organize shuffle writer code [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove commented on PR #1439: URL: https://github.com/apache/datafusion-comet/pull/1439#issuecomment-2683575116 Thanks for the reviews @mbutrovich @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] chore: Re-organize shuffle writer code [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove merged PR #1439: URL: https://github.com/apache/datafusion-comet/pull/1439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Require `Debug` for `DataSource` [datafusion]

2025-02-25 Thread via GitHub
alamb commented on code in PR #14882: URL: https://github.com/apache/datafusion/pull/14882#discussion_r1970647757 ## datafusion/datasource/src/source.rs: ## @@ -35,7 +35,7 @@ use datafusion_physical_expr_common::sort_expr::LexOrdering; /// Common behaviors in Data Sources for

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2683457657 > > I think the issue is that the runner in https://github.com/Omega359/sqllogictest-rs is based on an older version of the sqllogictests than we use in datafusion. > > I have an id

[PR] Include struct name on FileScanConfig debug impl [datafusion]

2025-02-25 Thread via GitHub
alamb opened a new pull request, #14883: URL: https://github.com/apache/datafusion/pull/14883 ## Which issue does this PR close? Part of - Related to https://github.com/delta-io/delta-rs/pull/3261 - Related to https://github.com/apache/datafusion/issues/14123 ## Ration

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
comphead commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683472594 > > > I did a quick check in the spark functions PR, I didn't see anything related to fill. > > > > > > hi @Omega359 are you referring to https://spark.apache.org/docs/3.

Re: [PR] Require `Debug` for `DataSource` [datafusion]

2025-02-25 Thread via GitHub
alamb commented on code in PR #14882: URL: https://github.com/apache/datafusion/pull/14882#discussion_r1970647757 ## datafusion/datasource/src/source.rs: ## @@ -35,7 +35,7 @@ use datafusion_physical_expr_common::sort_expr::LexOrdering; /// Common behaviors in Data Sources for

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14881: URL: https://github.com/apache/datafusion/pull/14881#issuecomment-2683458549 @Omega359 has an alternate plan here potentially: https://github.com/apache/datafusion/pull/14824#issuecomment-2683289068 -- This is an automated message from the Apache Git Service

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683448966 > > I did a quick check in the spark functions PR, I didn't see anything related to fill. > > hi @Omega359 are you referring to https://spark.apache.org/docs/3.5.4/api/java/o

Re: [PR] Dataframe with_column and with_column_renamed performance improvements [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on code in PR #14653: URL: https://github.com/apache/datafusion/pull/14653#discussion_r1970633135 ## datafusion/core/src/dataframe/mod.rs: ## @@ -183,6 +183,8 @@ pub struct DataFrame { // Box the (large) SessionState to reduce the size of DataFrame on the

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14881: URL: https://github.com/apache/datafusion/pull/14881#issuecomment-2683409993 Ok, I think this one now works again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Use arrow IPC Stream format for spill files [datafusion]

2025-02-25 Thread via GitHub
comphead commented on PR #14868: URL: https://github.com/apache/datafusion/pull/14868#issuecomment-2683393024 Wondering if this PR also addresses partially https://github.com/apache/datafusion/issues/14078 Another thing to mention we still lack some spilling test cases like https://g

Re: [I] [BUG]: SortMergeJoin filtered LeftAnti fails on TPCH Q21 [datafusion-comet]

2025-02-25 Thread via GitHub
comphead closed issue #861: [BUG]: SortMergeJoin filtered LeftAnti fails on TPCH Q21 URL: https://github.com/apache/datafusion-comet/issues/861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] [BUG]: SortMergeJoin filtered LeftAnti fails on TPCH Q21 [datafusion-comet]

2025-02-25 Thread via GitHub
comphead commented on issue #861: URL: https://github.com/apache/datafusion-comet/issues/861#issuecomment-2683386368 Fixed with Datafusion 44 dependency update -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
comphead commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683379630 > I did a quick check in the spark functions PR, I didn't see anything related to fill. hi @Omega359 are you referring to https://spark.apache.org/docs/3.5.4/api/java/org/apa

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-25 Thread via GitHub
Lordworms commented on code in PR #14844: URL: https://github.com/apache/datafusion/pull/14844#discussion_r1970592527 ## datafusion-cli/src/helper.rs: ## @@ -326,15 +326,6 @@ mod tests { )?; assert!(matches!(result, ValidationResult::Valid(None))); -

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2683366217 Note that this seems to also be in the spark function PR - https://github.com/apache/datafusion/pull/14392/files#diff-2bbff2d3a0ce9cec9ed9d6ec6e38ff910875af704b60855f43b47b46c96c5d44

Re: [PR] Add `statistics_truncate_length` parquet writer config [datafusion]

2025-02-25 Thread via GitHub
alamb merged PR #14782: URL: https://github.com/apache/datafusion/pull/14782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683332509 FWI @Omega359 and @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Add `statistics_truncate_length` parquet writer config [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14782: URL: https://github.com/apache/datafusion/pull/14782#issuecomment-2683329263 Thanks again @niebayes and @akoshchiy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Configure `statistics_truncate_length` in Parquet writer [datafusion]

2025-02-25 Thread via GitHub
alamb closed issue #14601: Configure `statistics_truncate_length` in Parquet writer URL: https://github.com/apache/datafusion/issues/14601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Incorrect backslash treatment in string literals in DataFusion CLI [datafusion]

2025-02-25 Thread via GitHub
alamb commented on issue #13286: URL: https://github.com/apache/datafusion/issues/13286#issuecomment-2683323805 @Lordworms has a proposed PR here to fix this: - https://github.com/apache/datafusion/pull/14844 However it seems like the way to get consistent behavior with sqllogictes

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14844: URL: https://github.com/apache/datafusion/pull/14844#issuecomment-2683322075 > I think this is correct, because postgresql treats line breaks, such as \t \0, as plain text, but in the original design of DataFusion, \t, \0 should be escaped, so for \1, since it

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
alamb commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970539407 ## datafusion/sqllogictest/regenerate/src/engines/datafusion_engine/runner.rs: ## @@ -0,0 +1,131 @@ +// Licensed to the Apache Software Foundation (ASF) under one R

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-25 Thread via GitHub
Lordworms commented on PR #14844: URL: https://github.com/apache/datafusion/pull/14844#issuecomment-2683310903 > Hi @Lordworms -- thank you for this PR > > I played around with it a bit locally and something still doesn't seem right > > Specifically, postgres will treat `\1` li

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-25 Thread via GitHub
ozankabak commented on code in PR #14813: URL: https://github.com/apache/datafusion/pull/14813#discussion_r1970548831 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2043,9 +1848,10 @@ async fn test_multiple_sort_window_exec() -> Result<()> { let ex

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14881: URL: https://github.com/apache/datafusion/pull/14881#issuecomment-2683299551 I am still testing this locally -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970539669 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres -

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2683289068 > I think the issue is that the runner in https://github.com/Omega359/sqllogictest-rs is based on an older version of the sqllogictests than we use in datafusion. > > I have

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
alamb commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970539614 ## datafusion/sqllogictest/regenerate_sqlite_files.sh: ## @@ -168,8 +168,10 @@ sd -f i '^sqllogictest.*' 'sqllogictest = { git = "https://github.com/Omega359/s e

[PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
alamb opened a new pull request, #14881: URL: https://github.com/apache/datafusion/pull/14881 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/pull/14824 ## Rationale for this change The regeneration logic uses a fork of the sqllogicte

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2683234475 I think the issue is that the runner in https://github.com/Omega359/sqllogictest-rs is based on an older version of the sqllogictests than we use in datafusion. I have an idea

[PR] Parse MySQL ALTER TABLE ALGORITHM option [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
mvzink opened a new pull request, #1745: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1745 This may semantically be an option for how to perform the other operations in the statement, but syntactically MySQL treats it like an operation (e.g. can be in any order relative to ot

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2683198971 I am working on helping here as I think getting the tests back green is quite importatn -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Create an MSRV policy in this crate [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
alamb commented on issue #1744: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1744#issuecomment-2683182151 > From looking into [#1612](https://github.com/apache/datafusion-sqlparser-rs/issues/1612), some test code uses associated type bounds ([stabilized in 1.79](https://g

Re: [PR] Store spans for Value expressions [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
alamb commented on PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#issuecomment-2683188897 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Support User-Defined Sorting [datafusion]

2025-02-25 Thread via GitHub
alamb commented on issue #14828: URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2683188455 > Here I think we should go with user defined types and attach the sorting information to that type. This should be possible as we can have a "central registry" (e.g., `SessionCon

Re: [PR] refactor: replace OnceLock with LazyLock [datafusion]

2025-02-25 Thread via GitHub
AmosAidoo commented on PR #14880: URL: https://github.com/apache/datafusion/pull/14880#issuecomment-2683178362 😮 So proud -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Slowdown in ClickBench Q36-Q37 between DataFusion 43.0.0 and 44.0.0 [datafusion]

2025-02-25 Thread via GitHub
AdamGS commented on issue #14481: URL: https://github.com/apache/datafusion/issues/14481#issuecomment-2683173502 Would love to help on this issue. We built something similar for Vortex based on [moka](https://docs.rs/moka/latest/moka/) and it also saves on roundtrips during infer_schema/inf

Re: [PR] Parse signed/unsigned integer data type in MySQL CAST [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
mvzink commented on code in PR #1739: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1739#discussion_r1970476787 ## src/ast/data_type.rs: ## @@ -238,6 +238,26 @@ pub enum DataType { UnsignedBigInt(Option), /// Unsigned Int8 with optional display width e.g

Re: [PR] refactor: replace OnceLock with LazyLock [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14880: URL: https://github.com/apache/datafusion/pull/14880#issuecomment-2683172038 This PR turns out to have gotten the 10,000th commit: ![Screenshot 2025-02-25 at 3 07 21  PM](https://github.com/user-attachments/assets/57792ae9-447d-49f9-a9f0-e672609530fa)

Re: [I] Replace `OnceLock` with `LazyLock` [datafusion]

2025-02-25 Thread via GitHub
alamb closed issue #11687: Replace `OnceLock` with `LazyLock` URL: https://github.com/apache/datafusion/issues/11687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] refactor: replace OnceLock with LazyLock [datafusion]

2025-02-25 Thread via GitHub
alamb merged PR #14880: URL: https://github.com/apache/datafusion/pull/14880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: use `return_type_from_args` and mark nullable if any of the input is nullable [datafusion]

2025-02-25 Thread via GitHub
alamb merged PR #14841: URL: https://github.com/apache/datafusion/pull/14841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14864: URL: https://github.com/apache/datafusion/pull/14864#issuecomment-2683165379 I think this may be a dupe of - https://github.com/apache/datafusion/pull/14876 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-02-25 Thread via GitHub
alamb commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2683161214 > I didn't do that :) Sorry -- I got my github <-> real life handles mixed up. Maybe @goldmedal knows the right person -- This is an automated message from the Apache G

Re: [PR] Dataframe with_column and with_column_renamed performance improvements [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on code in PR #14653: URL: https://github.com/apache/datafusion/pull/14653#discussion_r1970468759 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1612,13 +1621,33 @@ pub fn union_by_name( pub fn project( plan: LogicalPlan, expr: impl IntoIter

  1   2   3   >