Re: [PR] bench: fix `vectorized_equal_to` bench mutated between iterations [datafusion]

2025-10-08 Thread via GitHub
rluvaton merged PR #17968: URL: https://github.com/apache/datafusion/pull/17968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] bench: fix `vectorized_equal_to` bench mutated between iterations [datafusion]

2025-10-08 Thread via GitHub
rluvaton commented on PR #17968: URL: https://github.com/apache/datafusion/pull/17968#issuecomment-3382104551 Updated. after this is merged I will create PR that will actually vectorize the vectorize_equal_to in primitive -- This is an automated message from the Apache Git Service. To res

Re: [PR] bench: fix `vectorized_equal_to` bench mutated between iterations [datafusion]

2025-10-08 Thread via GitHub
rluvaton commented on PR #17968: URL: https://github.com/apache/datafusion/pull/17968#issuecomment-3382111298 what? it was merged before the CI passed? I thought merged when ready was waiting for the CI to pass -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] minor: refactor Spark ascii function to reuse DataFusion ascii function code [datafusion]

2025-10-08 Thread via GitHub
Jefffrey commented on code in PR #17965: URL: https://github.com/apache/datafusion/pull/17965#discussion_r2413707276 ## datafusion/functions/src/string/ascii.rs: ## @@ -186,6 +184,8 @@ mod tests { test_ascii!(Some(String::from("a")), Ok(Some(97))); test_ascii!(

[PR] minor: refactor Spark ascii function to reuse DataFusion ascii function code [datafusion]

2025-10-08 Thread via GitHub
Jefffrey opened a new pull request, #17965: URL: https://github.com/apache/datafusion/pull/17965 ## Which issue does this PR close? Part of #17964 ## Rationale for this change Some of the internals are the same; reuse this code. ## What changes are

Re: [PR] chore(deps): bump taiki-e/install-action from 2.61.8 to 2.62.22 [datafusion-sandbox]

2025-10-08 Thread via GitHub
dependabot[bot] commented on PR #27: URL: https://github.com/apache/datafusion-sandbox/pull/27#issuecomment-3381093965 Superseded by #28. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: Add new config to limit Comet memory pool usage per task + bug fixes [WIP] [datafusion-comet]

2025-10-08 Thread via GitHub
codecov-commenter commented on PR #2538: URL: https://github.com/apache/datafusion-comet/pull/2538#issuecomment-3382453771 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2538?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] FileScanConfig: Preserve schema metadata across ser/de boundary [datafusion]

2025-10-08 Thread via GitHub
milenkovicm commented on code in PR #17966: URL: https://github.com/apache/datafusion/pull/17966#discussion_r2413849310 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -2221,3 +2224,41 @@ async fn roundtrip_memory_source() -> Result<()> { .await?;

[I] Nondeterministic, incorrect result for symmetric aggregates [datafusion]

2025-10-08 Thread via GitHub
colinmarc opened a new issue, #17970: URL: https://github.com/apache/datafusion/issues/17970 ### Describe the bug Hi! We're trying to support an unnamed BI tool on top of datafusion, which is fond of generating symmetric aggregates that look like this: ```sql SELECT (

[PR] add Distribution based statistics to PartitionedFile [datafusion]

2025-10-08 Thread via GitHub
adriangb opened a new pull request, #17978: URL: https://github.com/apache/datafusion/pull/17978 Maybe closes https://github.com/apache/datafusion/issues/8078? This is a stab at integrating the new `Distribution` based statistics into file/partition level statistics. -- This is an

Re: [PR] clean up duplicate information in FileOpener trait [datafusion]

2025-10-08 Thread via GitHub
adriangb merged PR #17956: URL: https://github.com/apache/datafusion/pull/17956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] fix docs and broken example from #17956 [datafusion]

2025-10-08 Thread via GitHub
adriangb opened a new pull request, #17980: URL: https://github.com/apache/datafusion/pull/17980 Small followups to https://github.com/apache/datafusion/pull/17956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Introduce a way to represent constrained statistics / bounds on values in Statistics [datafusion]

2025-10-08 Thread via GitHub
adriangb commented on issue #8078: URL: https://github.com/apache/datafusion/issues/8078#issuecomment-3382885667 I gave it a shot in https://github.com/apache/datafusion/pull/17980. I decided to create a new struct to replace `ColumnStatistics` and a new struct to replace `Statistics` and

Re: [PR] [branch-50] Prepare for 50.2.0 release [datafusion]

2025-10-08 Thread via GitHub
alamb commented on code in PR #17963: URL: https://github.com/apache/datafusion/pull/17963#discussion_r2414814664 ## dev/changelog/50.2.0.md: ## @@ -0,0 +1,318 @@ + + +# Apache DataFusion 50.2.0 Changelog + +This release consists of 3 commits from 1 contributors. See credits at

Re: [I] SparkDateAdd does not check for overflow [datafusion]

2025-10-08 Thread via GitHub
andygrove commented on issue #17987: URL: https://github.com/apache/datafusion/issues/17987#issuecomment-3383442750 @chenkovsky @shehabgamin fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] chore: update to datafusion v50 [datafusion-ballista]

2025-10-08 Thread via GitHub
metegenez commented on code in PR #1320: URL: https://github.com/apache/datafusion-ballista/pull/1320#discussion_r2413131863 ## ballista/client/tests/context_checks.rs: ## @@ -935,4 +935,45 @@ mod supported { Ok(()) } + +// at the moment sort merge join is n

Re: [I] Release DataFusion `50.2.0` (minor) [datafusion]

2025-10-08 Thread via GitHub
xudong963 commented on issue #17849: URL: https://github.com/apache/datafusion/issues/17849#issuecomment-3380664430 It seems all PRs are patched. I plan to release it at night. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] fix: fix failing test compilation on main [datafusion]

2025-10-08 Thread via GitHub
Jefffrey merged PR #17955: URL: https://github.com/apache/datafusion/pull/17955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: convert_array_to_scalar_vec respects null elements [datafusion]

2025-10-08 Thread via GitHub
Jefffrey commented on code in PR #17891: URL: https://github.com/apache/datafusion/pull/17891#discussion_r2412895340 ## datafusion/functions-nested/src/array_has.rs: ## @@ -145,6 +145,7 @@ impl ScalarUDFImpl for ArrayHas { let list = scalar_values

Re: [PR] feat: convert_array_to_scalar_vec respects null elements [datafusion]

2025-10-08 Thread via GitHub
vegarsti commented on code in PR #17891: URL: https://github.com/apache/datafusion/pull/17891#discussion_r2412920601 ## datafusion/functions-nested/src/array_has.rs: ## @@ -145,6 +145,7 @@ impl ScalarUDFImpl for ArrayHas { let list = scalar_values

[PR] chore(deps): bump taiki-e/install-action from 2.62.22 to 2.62.23 [datafusion]

2025-10-08 Thread via GitHub
dependabot[bot] opened a new pull request, #17959: URL: https://github.com/apache/datafusion/pull/17959 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.22 to 2.62.23. Release notes Sourced from https://github.com/taiki-e/install-action/releases";

Re: [I] Release DataFusion `50.2.0` (minor) [datafusion]

2025-10-08 Thread via GitHub
alamb commented on issue #17849: URL: https://github.com/apache/datafusion/issues/17849#issuecomment-3380848777 Thank you @xudong963 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: Support reading CSV files with inconsistent column counts [datafusion]

2025-10-08 Thread via GitHub
alamb commented on PR #17553: URL: https://github.com/apache/datafusion/pull/17553#issuecomment-3380850830 Thank you @Jefffrey and @EeshanBembi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] feat: Adds Instrumented Object Store Registry to datafusion-cli [datafusion]

2025-10-08 Thread via GitHub
alamb commented on PR #17953: URL: https://github.com/apache/datafusion/pull/17953#issuecomment-3380879282 > (note: If we need an individual issue to close for process reasons I am happy to make one and associate it with https://github.com/apache/datafusion/issues/17207 as part of that larg

Re: [PR] Chore: Use DataFusion impl of date_trunc function [datafusion-comet]

2025-10-08 Thread via GitHub
kazantsev-maksim commented on PR #2523: URL: https://github.com/apache/datafusion-comet/pull/2523#issuecomment-3380048220 The DataFusion implementation of [date_trunc](https://github.com/apache/datafusion/blob/main/datafusion/functions/src/datetime/date_trunc.rs#L279) has a narrower scope

Re: [PR] Added support for SQLite triggers [datafusion-sqlparser-rs]

2025-10-08 Thread via GitHub
LucaCappelletti94 commented on PR #2037: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2037#issuecomment-3381061328 > @LucaCappelletti94 [the link](https://github.com/apache/datafusion-sqlparser-rs/pull/2037#discussion_r2379300179) takes me to the top of the page not sure why

Re: [PR] feat: optimize grouping and introduced unparsing and substrait support [datafusion]

2025-10-08 Thread via GitHub
chenkovsky commented on PR #16161: URL: https://github.com/apache/datafusion/pull/16161#issuecomment-3380889170 @Slimsammylim I created another PR. #17961 , please take a snapshot. it will not touch grouping udaf in logical plan level. so It will support substrait by design. -- This is

[PR] feat: Add percentile_cont aggregate function [datafusion]

2025-10-08 Thread via GitHub
adriangb opened a new pull request, #17988: URL: https://github.com/apache/datafusion/pull/17988 ## Summary Adds exact `percentile_cont` aggregate function as the counterpart to the existing `approx_percentile_cont` function. ## What changes were made? ### New Implementa

[PR] feat: Add new config to limit Comet memory pool usage per task + bug fixes [WIP] [datafusion-comet]

2025-10-08 Thread via GitHub
andygrove opened a new pull request, #2538: URL: https://github.com/apache/datafusion-comet/pull/2538 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Consider adding a JoinGraph structure for representing joins [datafusion]

2025-10-08 Thread via GitHub
JanKaul commented on issue #17719: URL: https://github.com/apache/datafusion/issues/17719#issuecomment-3381972242 > Why would we provide an extra trait for this? Couldn't we provide a function on LogicalPlan and add it to the UserDefinedLogicalNode trait? Or am I missing something here?

[PR] bench: fix `vectorized_equal_to` bench mutated between iterations [datafusion]

2025-10-08 Thread via GitHub
rluvaton opened a new pull request, #17968: URL: https://github.com/apache/datafusion/pull/17968 ## Which issue does this PR close? N/A ## Rationale for this change In the benchmark `aggregate_vectorized` when benchmarking `vectorized_equal_to` passing the same `equal_to

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-10-08 Thread via GitHub
alamb commented on PR #17813: URL: https://github.com/apache/datafusion/pull/17813#issuecomment-3383056548 Filed follow on tickets - https://github.com/apache/datafusion/issues/17982 - https://github.com/apache/datafusion/issues/17983 -- This is an automated message from the Apache G

Re: [PR] feat: convert_array_to_scalar_vec respects null elements [datafusion]

2025-10-08 Thread via GitHub
vegarsti commented on code in PR #17891: URL: https://github.com/apache/datafusion/pull/17891#discussion_r2414957987 ## datafusion/functions-nested/src/array_has.rs: ## @@ -145,6 +145,7 @@ impl ScalarUDFImpl for ArrayHas { let list = scalar_values

Re: [PR] feat: Adds Instrumented Object Store Registry to datafusion-cli [datafusion]

2025-10-08 Thread via GitHub
alamb commented on PR #17953: URL: https://github.com/apache/datafusion/pull/17953#issuecomment-3383130819 Let's keep development going -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Adds Instrumented Object Store Registry to datafusion-cli [datafusion]

2025-10-08 Thread via GitHub
alamb merged PR #17953: URL: https://github.com/apache/datafusion/pull/17953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Consider adding a JoinGraph structure for representing joins [datafusion]

2025-10-08 Thread via GitHub
alamb commented on issue #17719: URL: https://github.com/apache/datafusion/issues/17719#issuecomment-3383147353 > > I thought that maybe different databases would like to have their own logic for these estimations. Having a trait would allow to adjust this. > > Oh, right haven't thoug

Re: [PR] feat: convert_array_to_scalar_vec respects null elements [datafusion]

2025-10-08 Thread via GitHub
vegarsti commented on code in PR #17891: URL: https://github.com/apache/datafusion/pull/17891#discussion_r2415027061 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -687,13 +687,16 @@ impl Accumulator for OrderSensitiveArrayAggAccumulator { // Convert array

Re: [PR] feat: convert_array_to_scalar_vec respects null elements [datafusion]

2025-10-08 Thread via GitHub
vegarsti commented on code in PR #17891: URL: https://github.com/apache/datafusion/pull/17891#discussion_r2415592338 ## datafusion/functions-nested/src/array_has.rs: ## @@ -145,6 +145,7 @@ impl ScalarUDFImpl for ArrayHas { let list = scalar_values

Re: [PR] feat: convert_array_to_scalar_vec respects null elements [datafusion]

2025-10-08 Thread via GitHub
Jefffrey commented on code in PR #17891: URL: https://github.com/apache/datafusion/pull/17891#discussion_r2415687883 ## datafusion/functions-nested/src/array_has.rs: ## @@ -145,6 +145,7 @@ impl ScalarUDFImpl for ArrayHas { let list = scalar_values

Re: [PR] FileScanConfig: Preserve schema metadata across ser/de boundary [datafusion]

2025-10-08 Thread via GitHub
milenkovicm commented on PR #17966: URL: https://github.com/apache/datafusion/pull/17966#issuecomment-3381544245 one other question for you @mach-kernel as you posted this in ballista group, do you need this to get to ballista 50? if so maybe we could back-port to datafusion 50.2 if @xudong

Re: [PR] bench: fix `vectorized_equal_to` bench mutated between iterations [datafusion]

2025-10-08 Thread via GitHub
comphead commented on PR #17968: URL: https://github.com/apache/datafusion/pull/17968#issuecomment-3382091971 > I thought about all false, but an optimization for all false should not be here and instead avoid calling this function entirely so I did not add it Lets probably add this t

Re: [PR] bench: fix `vectorized_equal_to` bench mutated between iterations [datafusion]

2025-10-08 Thread via GitHub
rluvaton commented on PR #17968: URL: https://github.com/apache/datafusion/pull/17968#issuecomment-3382082899 I thought about all false, but an optimization for all false should not be here and instead avoid calling this function entirely so I did not add it -- This is an automated messag

Re: [PR] feat: Push down hashes to probe side in HashJoinExec [datafusion]

2025-10-08 Thread via GitHub
LiaCastaneda commented on code in PR #17529: URL: https://github.com/apache/datafusion/pull/17529#discussion_r2413960610 ## datafusion/physical-plan/src/joins/hash_join/information_passing.rs: ## @@ -0,0 +1,612 @@ +// Licensed to the Apache Software Foundation (ASF) under one +/

Re: [PR] perf: Optimize `multi_group_by` when there are a lot of unique groups [datafusion]

2025-10-08 Thread via GitHub
rluvaton commented on PR #17592: URL: https://github.com/apache/datafusion/pull/17592#issuecomment-3381105663 > Any update on the benchmark / showing some query that this improve performances? Sorry we are in a holiday season, creating one now -- This is an automated message from t

[I] Restore expr/expr optimisation for case expressions [datafusion]

2025-10-08 Thread via GitHub
pepijnve opened a new issue, #17972: URL: https://github.com/apache/datafusion/issues/17972 ### Is your feature request related to a problem or challenge? #15384 reverted an optimisation that was made in #11638. I believe there's a simple fix for the original problem that doesn't thro

Re: [I] Nondeterministic, incorrect result for symmetric aggregates [datafusion]

2025-10-08 Thread via GitHub
alamb commented on issue #17970: URL: https://github.com/apache/datafusion/issues/17970#issuecomment-3382485006 > rtunately, the results from DF seem to be nondeterministic, and often negative. The issue seems to be with floating point math, related to the literal 18446744073709551616 (2^64

Re: [PR] FileScanConfig: Preserve schema metadata across ser/de boundary [datafusion]

2025-10-08 Thread via GitHub
milenkovicm commented on code in PR #17966: URL: https://github.com/apache/datafusion/pull/17966#discussion_r2414294432 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -2221,3 +2224,41 @@ async fn roundtrip_memory_source() -> Result<()> { .await?;

Re: [PR] perf: add to `aggregate_vectorized` bench benchmark for `PrimitiveGroupValueBuilder` as well [datafusion]

2025-10-08 Thread via GitHub
alamb merged PR #17930: URL: https://github.com/apache/datafusion/pull/17930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] clean up duplicate information in FileOpener trait [datafusion]

2025-10-08 Thread via GitHub
adriangb commented on PR #17956: URL: https://github.com/apache/datafusion/pull/17956#issuecomment-3382638587 @comphead I've renamed all uses to `partitioned_file: PartitionedFile` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] feat: implement GroupArrayAggAccumulator attempt 3 [datafusion]

2025-10-08 Thread via GitHub
duongcongtoai commented on code in PR #17915: URL: https://github.com/apache/datafusion/pull/17915#discussion_r2414640860 ## datafusion/functions-aggregate-common/src/aggregate/array_agg.rs: ## @@ -0,0 +1,491 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] feat: convert_array_to_scalar_vec respects null elements [datafusion]

2025-10-08 Thread via GitHub
Jefffrey commented on code in PR #17891: URL: https://github.com/apache/datafusion/pull/17891#discussion_r2415393230 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -687,13 +687,16 @@ impl Accumulator for OrderSensitiveArrayAggAccumulator { // Convert array

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-10-08 Thread via GitHub
pepijnve commented on PR #17813: URL: https://github.com/apache/datafusion/pull/17813#issuecomment-3384370612 > Thank you @pepijnve -- I re-reviewed this PR carefully and I think it is well thought out, commented and tested. Thanks. I was starting to question my sanity a bit 😄 I'll pr

Re: [I] Converting `TableConstraint` struct enum variants into separated structs [datafusion-sqlparser-rs]

2025-10-08 Thread via GitHub
LucaCappelletti94 commented on issue #2053: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2053#issuecomment-3384307619 Closed as completed by merging PR #2054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a