Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
korowa commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1885543455 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// able

Re: [PR] Slightly faster keyword lookups [datafusion-sqlparser-rs]

2024-12-14 Thread via GitHub
alamb commented on code in PR #1591: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1591#discussion_r1885037004 ## src/keywords.rs: ## @@ -973,3 +973,61 @@ pub const RESERVED_FOR_IDENTIFIER: &[Keyword] = &[ Keyword::STRUCT, Keyword::TRIM, ]; + +pub const

Re: [PR] Improve parsing performance by reducing token cloning [datafusion-sqlparser-rs]

2024-12-14 Thread via GitHub
alamb commented on PR #1587: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1587#issuecomment-2543067810 I merged this PR up to main to resolve a conflict -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[PR] Avoid cloning during token creation [datafusion-sqlparser-rs]

2024-12-14 Thread via GitHub
alamb opened a new pull request, #1603: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1603 # Rationale Inspired by @davisp's https://github.com/apache/datafusion-sqlparser-rs/pull/1591 I was looking at `Token::make_word` and I noticed it made *2* clones (owned strings). Eac

Re: [PR] Avoid cloning during token creation [datafusion-sqlparser-rs]

2024-12-14 Thread via GitHub
alamb commented on PR #1603: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1603#issuecomment-2543080230 I was looking at our benchmarks, and they are very limited. If we want to pursue optimizing sqlparser more I think it would probably be a good idea to add many queries (li

Re: [PR] Improve header size on summary page [datafusion-site]

2024-12-14 Thread via GitHub
alamb commented on PR #50: URL: https://github.com/apache/datafusion-site/pull/50#issuecomment-2543111741 > Instead of changing the articles, what if we just adjust the css to auto apply a different size? I did do a smaller adjustment already, but we can certainly push it further. In https

Re: [I] Consider making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
berkaysynnada commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543059309 I have experienced a similar problem for wasm, https://github.com/apache/datafusion/issues/13513. After disabling it, got no problem. -- This is an automated message fr

Re: [PR] Slightly faster keyword lookups [datafusion-sqlparser-rs]

2024-12-14 Thread via GitHub
alamb commented on code in PR #1591: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1591#discussion_r1885040511 ## src/keywords.rs: ## @@ -973,3 +973,61 @@ pub const RESERVED_FOR_IDENTIFIER: &[Keyword] = &[ Keyword::STRUCT, Keyword::TRIM, ]; + +pub const

[PR] Work around ASF github action build failure [datafusion-site]

2024-12-14 Thread via GitHub
timsaucer opened a new pull request, #51: URL: https://github.com/apache/datafusion-site/pull/51 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Natural Join Column Qualification Conflict [datafusion]

2024-12-14 Thread via GitHub
findepi commented on issue #13774: URL: https://github.com/apache/datafusion/issues/13774#issuecomment-2543131831 > ``` > > SELECT * FROM server NATURAL JOIN client AS client ORDER BY server.id; > Schema error: No field named server.id. Valid fields are client.id, client.name, client.s

[PR] Add Round trip tests for Array <--> ScalarValue [datafusion]

2024-12-14 Thread via GitHub
alamb opened a new pull request, #13777: URL: https://github.com/apache/datafusion/pull/13777 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13762 ## Rationale for this change When trying to write a unit test for https://github.c

Re: [PR] Round floats but not decimals in SqlLogicTests [datafusion]

2024-12-14 Thread via GitHub
findepi commented on PR #13743: URL: https://github.com/apache/datafusion/pull/13743#issuecomment-2543134041 float rounding could help with some of these eg https://github.com/user-attachments/assets/df0d41ff-0a70-4577-b72a-2bb8025b9f97"; /> https://github.com/user-attachments/

Re: [I] Schema error when returning DenseUnion from ScalarUDF [datafusion]

2024-12-14 Thread via GitHub
alamb commented on issue #13762: URL: https://github.com/apache/datafusion/issues/13762#issuecomment-2543133819 I have debugged this a bit more - the issue appears to be in `ScalarValue::to_array_of_size` where it always creates a sparse UnionArray. This very non subtle. I worked up

Re: [PR] Add Round trip tests for Array <--> ScalarValue [datafusion]

2024-12-14 Thread via GitHub
alamb commented on code in PR #13777: URL: https://github.com/apache/datafusion/pull/13777#discussion_r1885081284 ## datafusion/common/src/scalar/mod.rs: ## @@ -5554,6 +,196 @@ mod tests { assert_eq!(&array, &expected); } +#[test] +fn round_trip() { +

Re: [PR] Work around ASF github action build failure [datafusion-site]

2024-12-14 Thread via GitHub
timsaucer commented on PR #51: URL: https://github.com/apache/datafusion-site/pull/51#issuecomment-2543133804 @alamb I believe this will temporarily fix the CI problem. I will open a PR for asf-infra team to fix it upstream. I'm going to move this to draft not because it isn't ready but bec

Re: [PR] Support unicode character for `initcap` function [datafusion]

2024-12-14 Thread via GitHub
2010YOUY01 commented on code in PR #13752: URL: https://github.com/apache/datafusion/pull/13752#discussion_r1885081435 ## datafusion/functions/src/unicode/initcap.rs: ## @@ -74,7 +76,7 @@ impl ScalarUDFImpl for InitcapFunc { DataType::LargeUtf8 => make_scalar_functi

Re: [PR] Add Round trip tests for Array <--> ScalarValue [datafusion]

2024-12-14 Thread via GitHub
alamb commented on code in PR #13777: URL: https://github.com/apache/datafusion/pull/13777#discussion_r1885082987 ## datafusion/common/src/scalar/mod.rs: ## @@ -5554,6 +,194 @@ mod tests { assert_eq!(&array, &expected); } +#[test] +fn round_trip() {

Re: [PR] Add Round trip tests for Array <--> ScalarValue [datafusion]

2024-12-14 Thread via GitHub
findepi commented on code in PR #13777: URL: https://github.com/apache/datafusion/pull/13777#discussion_r1885081611 ## datafusion/common/src/scalar/mod.rs: ## @@ -5554,6 +,196 @@ mod tests { assert_eq!(&array, &expected); } +#[test] +fn round_trip() {

Re: [PR] Add Round trip tests for Array <--> ScalarValue [datafusion]

2024-12-14 Thread via GitHub
alamb commented on code in PR #13777: URL: https://github.com/apache/datafusion/pull/13777#discussion_r1885081284 ## datafusion/common/src/scalar/mod.rs: ## @@ -5554,6 +,196 @@ mod tests { assert_eq!(&array, &expected); } +#[test] +fn round_trip() { +

Re: [PR] Work around ASF github action build failure [datafusion-site]

2024-12-14 Thread via GitHub
alamb commented on PR #51: URL: https://github.com/apache/datafusion-site/pull/51#issuecomment-2543136764 Thank you ๐Ÿ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Fix get_type for higher-order array functions [datafusion]

2024-12-14 Thread via GitHub
findepi commented on PR #13756: URL: https://github.com/apache/datafusion/pull/13756#issuecomment-2543137271 > Given that we coerce fixed size list to list, the return type of `array_element(fixed size list)` makes sense to be list. in the unit test, we ask for `array_element(list(fix

Re: [PR] Minor: Add some more blog posts to the readings page [datafusion]

2024-12-14 Thread via GitHub
alamb commented on PR #13761: URL: https://github.com/apache/datafusion/pull/13761#issuecomment-2543104917 Thanks again @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Minor: Add some more blog posts to the readings page [datafusion]

2024-12-14 Thread via GitHub
alamb merged PR #13761: URL: https://github.com/apache/datafusion/pull/13761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
alamb commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543106793 I have added this ticket to the the list of things we should fix before 44 release: https://github.com/apache/datafusion/issues/13334 -- This is an automated message from the Ap

Re: [I] Consider making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
alamb commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543106520 @blaginin has a PR to add recursive to sqlparser-rs as well: - https://github.com/apache/datafusion-sqlparser-rs/pull/1522 Perhaps we can follow his model for making this

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-14 Thread via GitHub
alamb commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1885065220 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .map(|ar

Re: [PR] Improve header size on summary page [datafusion-site]

2024-12-14 Thread via GitHub
timsaucer commented on PR #50: URL: https://github.com/apache/datafusion-site/pull/50#issuecomment-2543109867 Instead of changing the articles, what if we just adjust the css to auto apply a different size? -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] Stack Overflow with Deeply Nested Filter Expressions [datafusion]

2024-12-14 Thread via GitHub
alamb commented on issue #8900: URL: https://github.com/apache/datafusion/issues/8900#issuecomment-2543107473 This may have been fixed by - https://github.com/apache/datafusion/pull/13310 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
alamb commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543107565 FYI @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] chore: dependency cleanup [datafusion-ballista]

2024-12-14 Thread via GitHub
andygrove merged PR #1150: URL: https://github.com/apache/datafusion-ballista/pull/1150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] feat!: change catalog provider and schema provider methods to be asynchronous [datafusion]

2024-12-14 Thread via GitHub
westonpace commented on PR #13582: URL: https://github.com/apache/datafusion/pull/13582#issuecomment-2543112725 @alamb so I think this PR might still useful. 90% of the change here is not coming from the planning path but rather from the registration path. Your caching approach and P

Re: [PR] Work around ASF github action build failure [datafusion-site]

2024-12-14 Thread via GitHub
timsaucer merged PR #51: URL: https://github.com/apache/datafusion-site/pull/51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Proposal: introduced typed expressions, separate AST and IR [datafusion]

2024-12-14 Thread via GitHub
findepi commented on issue #12604: URL: https://github.com/apache/datafusion/issues/12604#issuecomment-2543146393 The introduction of a new IR and new Expressions has these challenges - merit: actual design of the IR and new expressions - work: actual step by step evolutionary intr

Re: [PR] DataFusion Python 43.1.0 announcement [datafusion-site]

2024-12-14 Thread via GitHub
timsaucer commented on PR #43: URL: https://github.com/apache/datafusion-site/pull/43#issuecomment-2543146788 @alamb I pulled the css font size changes for the index page into this PR but we can easily pull it out. It is live right now on https://datafusion.staged.apache.org/blog/ if you wa

Re: [I] Making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
findepi commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543138382 > * [Add `#[recursive]`ย  datafusion-sqlparser-rs#1522](https://github.com/apache/datafusion-sqlparser-rs/pull/1522) > > Perhaps we can follow his model for making this feat

Re: [PR] Fix get_type for higher-order array functions [datafusion]

2024-12-14 Thread via GitHub
findepi commented on code in PR #13756: URL: https://github.com/apache/datafusion/pull/13756#discussion_r1885083552 ## datafusion/functions-nested/src/extract.rs: ## @@ -993,3 +993,84 @@ where let data = mutable.freeze(); Ok(arrow::array::make_array(data)) } + +#[cfg(

Re: [I] Making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
berkaysynnada commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543143643 cc @buraksenn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
peter-toth commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543150144 Seems like `rust_psm_stack_pointer ` is already on this wishlist for `miri`: https://github.com/rust-lang/miri/issues/2057 -- This is an automated message from the Apache G

Re: [I] Limit together with pushdown_filters [datafusion]

2024-12-14 Thread via GitHub
zhuqi-lucas commented on issue #13745: URL: https://github.com/apache/datafusion/issues/13745#issuecomment-2543161340 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] minor make recursive optional [datafusion]

2024-12-14 Thread via GitHub
buraksenn opened a new pull request, #13778: URL: https://github.com/apache/datafusion/pull/13778 ## Which issue does this PR close? Closes #13766 ## Rationale for this change Adding recursive package causes issues for downstream projects ## What changes are includ

[PR] wip: bug fix & tests [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove opened a new pull request, #53: URL: https://github.com/apache/datafusion-ray/pull/53 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] "recursive" Dependency Causes "section too large" Error When Compiling for wasm [datafusion]

2024-12-14 Thread via GitHub
peter-toth commented on issue #13513: URL: https://github.com/apache/datafusion/issues/13513#issuecomment-2543156645 @berkaysynnada, is this sill an issue? https://github.com/rustwasm/wasm-pack/issues/1381#issuecomment-2153142927 seems to work for me too. ``` % export PATH=/opt/ho

Re: [I] Making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
buraksenn commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543167905 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Limit together with pushdown_filters [datafusion]

2024-12-14 Thread via GitHub
zhuqi-lucas commented on issue #13745: URL: https://github.com/apache/datafusion/issues/13745#issuecomment-2543159151 Can i take this issue? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[I] sql result discrepency with sqlite and postgres [datafusion]

2024-12-14 Thread via GitHub
Omega359 opened a new issue, #13779: URL: https://github.com/apache/datafusion/issues/13779 ### Describe the bug sqlite random/expr/slt_good_10.slt: ```sql SELECT ALL + + MIN ( - CAST ( + 6 AS INTEGER ) ) * CASE WHEN NULL BETWEEN - 88 AND ( - - 91 ) * - AVG ( 95 ) THEN COUNT ( -

[I] sql result discrepency with sqlite, postgres and duckdb [datafusion]

2024-12-14 Thread via GitHub
Omega359 opened a new issue, #13780: URL: https://github.com/apache/datafusion/issues/13780 ### Describe the bug ```sql SELECT - NULLIF ( + 15, - 27 + + CAST ( NULL AS REAL ) + + + MIN ( 35 ) * - COUNT ( * ) * - 14 + 64 ) * 87 * + 75 * 34 * + 76 + - - 31 AS col0, - 10 * - + 33 * (

Re: [I] sql result discrepency with sqlite and postgres [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on issue #13779: URL: https://github.com/apache/datafusion/issues/13779#issuecomment-2543170448 I believe another sql that has the same cause is: ``` External error: query result mismatch: [SQL] SELECT ALL + CASE WHEN AVG ( ALL + 88 ) >= 74 + + - 95 - CASE 23 W

Re: [I] Making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
buraksenn commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543167895 Thanks for the heads up @berkaysynnada . Opened a PR to make it optional -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[I] sql odd case of rounding compared to duckdb and postgresql [datafusion]

2024-12-14 Thread via GitHub
Omega359 opened a new issue, #13781: URL: https://github.com/apache/datafusion/issues/13781 ### Describe the bug ``` External error: query result mismatch: [SQL] SELECT + 99 * NULLIF ( 86, - CASE WHEN NOT COUNT ( * ) BETWEEN 84 AND 54 THEN NULLIF ( - 59, COUNT ( * ) ) + CAST ( N

Re: [I] sql odd case of rounding compared to duckdb and postgresql [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on issue #13781: URL: https://github.com/apache/datafusion/issues/13781#issuecomment-2543173280 Another version of what I think may be the same thing: ``` External error: query result mismatch: [SQL] SELECT DISTINCT - ( 43 ) * 42 + - + NULLIF ( + 90, - 49 - + C

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2024-12-14 Thread via GitHub
tustvold commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2543174655 I've pushed a simple example to [io_stall](https://github.com/tustvold/io_stall/blob/main/src/rayon.rs) that glues together [rayon](https://docs.rs/rayon/latest/rayon/) and [a

[I] sql result discrepency with sqlite, postgres and duckdb bug #2 [datafusion]

2024-12-14 Thread via GitHub
Omega359 opened a new issue, #13782: URL: https://github.com/apache/datafusion/issues/13782 ### Describe the bug ``` External error: query result mismatch: [SQL] SELECT - ( 74 ) * 62 + - 75 * + - 12 * + 68 * - + COUNT ( DISTINCT + 7 ) * 52 * + + COALESCE ( - + SUM ( ALL - 67 ),

Re: [I] sql result discrepency with sqlite, postgres and duckdb bug #2 [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on issue #13782: URL: https://github.com/apache/datafusion/issues/13782#issuecomment-2543175789 Another example of what is probably the same cause: Datafusion ```sql > SELECT ALL + + 84 * + - COUNT ( * ) * - - COALESCE ( + 37, + 14 ) * - NULLIF ( CASE + + COU

[PR] fix: Make `Display` implementation for `InList` deterministic [datafusion]

2024-12-14 Thread via GitHub
andygrove opened a new pull request, #13783: URL: https://github.com/apache/datafusion/pull/13783 ## Which issue does this PR close? N/A ## Rationale for this change In DataFusion Ray we have tests that check for an expected logical plan but the plans are

Re: [I] "recursive" Dependency Causes "section too large" Error When Compiling for wasm [datafusion]

2024-12-14 Thread via GitHub
berkaysynnada commented on issue #13513: URL: https://github.com/apache/datafusion/issues/13513#issuecomment-2543177609 That worked for me now (and passed the tests). Perhaps I should have tried with cargo clean ๐Ÿ˜ž -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] DataFusion Python 43.1.0 announcement [datafusion-site]

2024-12-14 Thread via GitHub
andygrove commented on code in PR #43: URL: https://github.com/apache/datafusion-site/pull/43#discussion_r1885116351 ## content/blog/2024-12-14-datafusion-python-43.1.0.md: ## @@ -0,0 +1,199 @@ +--- +layout: post +title: Apache DataFusion Python 43.1.0 Released +date: 2024-12-14

Re: [PR] fix: Make `Display` implementation for `InList` deterministic [datafusion]

2024-12-14 Thread via GitHub
andygrove closed pull request #13783: fix: Make `Display` implementation for `InList` deterministic URL: https://github.com/apache/datafusion/pull/13783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Making the `recursive` dependency an optional feature [datafusion]

2024-12-14 Thread via GitHub
blaginin commented on issue #13766: URL: https://github.com/apache/datafusion/issues/13766#issuecomment-2543186845 iโ€™m curious if there are any cases when downstream canโ€™t use recursive yet needs stack overflow protection ๐Ÿค” if so, we may need switch to iterative after all -- This is an a

Re: [PR] Support binary temporal arithmetic with integers [datafusion]

2024-12-14 Thread via GitHub
jonahgao commented on code in PR #13741: URL: https://github.com/apache/datafusion/pull/13741#discussion_r1885119711 ## datafusion/expr/src/expr_schema.rs: ## @@ -453,6 +455,26 @@ impl ExprSchemable for Expr { } _ => Ok(Expr::Cast(Cast::new(Box:

[I] sql result discrepency with sqlite, postgres and duckdb bug #3 [datafusion]

2024-12-14 Thread via GitHub
Omega359 opened a new issue, #13784: URL: https://github.com/apache/datafusion/issues/13784 ### Describe the bug This is an odd one that I'm unsure what to make of it ```sql CREATE TABLE tab0(col0 INTEGER, col1 INTEGER, col2 INTEGER); INSERT INTO tab0 VALUES(97,1,99); I

Re: [PR] Upgrade to DataFusion 43, fix a bug, add more tests [datafusion-ray]

2024-12-14 Thread via GitHub
edmondop commented on code in PR #53: URL: https://github.com/apache/datafusion-ray/pull/53#discussion_r1885182636 ## src/query_stage.rs: ## @@ -99,10 +99,14 @@ impl QueryStage { /// Get the input partition count. This is the same as the number of concurrent tasks ///

[I] 8b file takes 100s to process, even on second attempt (on main) [datafusion]

2024-12-14 Thread via GitHub
TheBuilderJR opened a new issue, #13785: URL: https://github.com/apache/datafusion/issues/13785 ### Describe the bug I expected based on the published benchmarks to have improvements, but I haven't seen any. I do see statistics are turned on and in theory optimizations in the last fe

Re: [PR] Upgrade to DataFusion 43, fix a bug, add more tests [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove commented on code in PR #53: URL: https://github.com/apache/datafusion-ray/pull/53#discussion_r1885291564 ## src/query_stage.rs: ## @@ -99,10 +99,14 @@ impl QueryStage { /// Get the input partition count. This is the same as the number of concurrent tasks //

Re: [I] 2gb parquet file takes 100s to process, even on second attempt (on main) [datafusion]

2024-12-14 Thread via GitHub
Dandandan commented on issue #13785: URL: https://github.com/apache/datafusion/issues/13785#issuecomment-2543275736 Hi @TheBuilderJR thanks for opening the issue. Is there a way we could reproduce your results? Did you compare performance to other engines (e.g. Spark, DuckDB)?

Re: [PR] Upgrade to DataFusion 43, fix a bug, add more tests [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove merged PR #53: URL: https://github.com/apache/datafusion-ray/pull/53 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] feat(function): add greatest function [datafusion]

2024-12-14 Thread via GitHub
rluvaton commented on code in PR #12474: URL: https://github.com/apache/datafusion/pull/12474#discussion_r1885347287 ## datafusion/functions/src/core/greatest.rs: ## @@ -0,0 +1,272 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[PR] feat(function): add least function [datafusion]

2024-12-14 Thread via GitHub
rluvaton opened a new pull request, #13786: URL: https://github.com/apache/datafusion/pull/13786 ## Which issue does this PR close? Closes #6531 ## Rationale for this change adding more expressions support, and I already added `greatest` ## What changes are include

Re: [PR] feat(function): add least function [datafusion]

2024-12-14 Thread via GitHub
rluvaton commented on code in PR #13786: URL: https://github.com/apache/datafusion/pull/13786#discussion_r1885350394 ## datafusion/functions/src/core/least.rs: ## @@ -0,0 +1,283 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-14 Thread via GitHub
rluvaton commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2543309483 I'll give my 2 cents from my experience regarding how other popular projects handle this issue In Node.js (I'm a core collaborator) before each release we run the tests o

[PR] wip: Code cleanup [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove opened a new pull request, #54: URL: https://github.com/apache/datafusion-ray/pull/54 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Remove record_batch! macro once upstream updates [datafusion]

2024-12-14 Thread via GitHub
buraksenn commented on issue #13037: URL: https://github.com/apache/datafusion/issues/13037#issuecomment-2543312794 I've took another look into this and tried to change it but the issue is that macro in arrow only accepts array literals such as: [1,2,3]. However, in the datafusion macro i

Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
korowa commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1884970634 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// able

Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
korowa commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1884970634 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// able

Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
korowa commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1884994212 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// able

Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
korowa commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1884977816 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// able

Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
jayzhan211 commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1884985362 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// a

Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
jayzhan211 commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1884985362 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// a

Re: [PR] feat: add support for array_contains expression [datafusion-comet]

2024-12-14 Thread via GitHub
dharanad commented on code in PR #1163: URL: https://github.com/apache/datafusion-comet/pull/1163#discussion_r188497 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2390,4 +2390,14 @@ class CometExpressionSuite extends CometTestBase with Adaptive

Re: [PR] feat: add support for array_contains expression [datafusion-comet]

2024-12-14 Thread via GitHub
dharanad commented on code in PR #1163: URL: https://github.com/apache/datafusion-comet/pull/1163#discussion_r188497 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2390,4 +2390,14 @@ class CometExpressionSuite extends CometTestBase with Adaptive

Re: [I] Cannot run benchmarks in k8s due to excessive spilling & OOM [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove closed issue #44: Cannot run benchmarks in k8s due to excessive spilling & OOM URL: https://github.com/apache/datafusion-ray/issues/44 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Single-node Python unit tests fail [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove commented on issue #52: URL: https://github.com/apache/datafusion-ray/issues/52#issuecomment-2543320317 Fixed in https://github.com/apache/datafusion-ray/pull/53 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Cannot run benchmarks in k8s due to excessive spilling & OOM [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove commented on issue #44: URL: https://github.com/apache/datafusion-ray/issues/44#issuecomment-2543320623 this is resolved now that we reverted to disk-based shuffle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Single-node Python unit tests fail [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove closed issue #52: Single-node Python unit tests fail URL: https://github.com/apache/datafusion-ray/issues/52 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [I] support make_interval function [datafusion]

2024-12-14 Thread via GitHub
milevin commented on issue #6951: URL: https://github.com/apache/datafusion/issues/6951#issuecomment-2543336098 Is anybody working on this? cc: @alamb I could really use it in the work that I am doing. (Supporting interval multiplication/division by integers; supporting Date32 + integ

Re: [PR] Support binary temporal arithmetic with integers [datafusion]

2024-12-14 Thread via GitHub
milevin commented on code in PR #13741: URL: https://github.com/apache/datafusion/pull/13741#discussion_r1885383953 ## datafusion/expr/src/expr_schema.rs: ## @@ -453,6 +455,26 @@ impl ExprSchemable for Expr { } _ => Ok(Expr::Cast(Cast::new(Box::

Re: [PR] Update documentation guidelines for contribution content [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on PR #13703: URL: https://github.com/apache/datafusion/pull/13703#issuecomment-2543358335 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] chore: Create devcontainer.json [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on PR #13520: URL: https://github.com/apache/datafusion/pull/13520#issuecomment-2543359104 > @Omega359 do you wanna take this as you have better configuration? I asked some months back in the discord channel if anyone was interested and got no interest at that time.

Re: [PR] Support unicode character for `initcap` function [datafusion]

2024-12-14 Thread via GitHub
tlm365 commented on code in PR #13752: URL: https://github.com/apache/datafusion/pull/13752#discussion_r1885439459 ## datafusion/functions/src/unicode/initcap.rs: ## @@ -74,7 +76,7 @@ impl ScalarUDFImpl for InitcapFunc { DataType::LargeUtf8 => make_scalar_function(i

Re: [PR] chore: Make query stage / shuffle code easier to understand [datafusion-ray]

2024-12-14 Thread via GitHub
andygrove commented on code in PR #54: URL: https://github.com/apache/datafusion-ray/pull/54#discussion_r1885382237 ## testdata/expected-plans/q1.txt: ## @@ -42,7 +42,7 @@ ShuffleWriterExec(stage_id=1, output_partitioning=Hash([Column { name: "l_return CoalesceBatchesE

[PR] Validating assumption on children partition count [datafusion-ray]

2024-12-14 Thread via GitHub
edmondop opened a new pull request, #56: URL: https://github.com/apache/datafusion-ray/pull/56 In the query stage, we take the output partition count of the first child if the plan has children, assuming all the plans have the same partition count. While this is effectively true, if a

Re: [PR] Validating assumption on children partition count [datafusion-ray]

2024-12-14 Thread via GitHub
edmondop commented on PR #56: URL: https://github.com/apache/datafusion-ray/pull/56#issuecomment-2543363471 I think this should be merged before #54 @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Test for string / numeric coercion [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on PR #13606: URL: https://github.com/apache/datafusion/pull/13606#issuecomment-2543364492 fyi https://github.com/apache/arrow-rs/issues/6714 was resolved recently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Refactor signatures for lpad, rpad, left, and right [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on code in PR #13420: URL: https://github.com/apache/datafusion/pull/13420#discussion_r1885408928 ## datafusion/sqllogictest/test_files/scalar.slt: ## @@ -1864,10 +1864,10 @@ query TT EXPLAIN SELECT letter, letter = LEFT(letter2, 1) FROM simple_string;

Re: [PR] Implementing Unit testing for Python [datafusion-ray]

2024-12-14 Thread via GitHub
edmondop closed pull request #50: Implementing Unit testing for Python URL: https://github.com/apache/datafusion-ray/pull/50 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Implementing Unit testing for Python [datafusion-ray]

2024-12-14 Thread via GitHub
edmondop commented on PR #50: URL: https://github.com/apache/datafusion-ray/pull/50#issuecomment-2543363371 Superseded by @andygrove https://github.com/apache/datafusion-ray/pull/53 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] docs: update GroupsAccumulator instead of GroupAccumulator [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on PR #13787: URL: https://github.com/apache/datafusion/pull/13787#issuecomment-2543365042 LGTM, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat(function): add least function [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on code in PR #13786: URL: https://github.com/apache/datafusion/pull/13786#discussion_r1885413349 ## datafusion/functions/src/core/least.rs: ## @@ -0,0 +1,283 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat(function): add least function [datafusion]

2024-12-14 Thread via GitHub
Omega359 commented on PR #13786: URL: https://github.com/apache/datafusion/pull/13786#issuecomment-2543367795 LGTM. thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
jayzhan211 commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1885451028 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// a

Re: [PR] Minor: Remove memory reservation in `JoinLeftData` used in HashJoin [datafusion]

2024-12-14 Thread via GitHub
jayzhan211 commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1885451635 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// a

  1   2   >