Re: [PR] POC: Parse to Merge Logical Plan [datafusion]

2025-04-25 Thread via GitHub
jonathanc-n commented on PR #15862: URL: https://github.com/apache/datafusion/pull/15862#issuecomment-2831915545 PTAL @jayzhan211 @universalmind303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-25 Thread via GitHub
iffyio commented on code in PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#discussion_r2061190816 ## src/parser/mod.rs: ## @@ -5256,14 +5256,19 @@ impl<'a> Parser<'a> { pub fn parse_create_trigger( &mut self, +or_alter: bool,

[I] Placeholders in IN lists are not inferred [datafusion]

2025-04-25 Thread via GitHub
kczimm opened a new issue, #15863: URL: https://github.com/apache/datafusion/issues/15863 ### Describe the bug Placeholders datatypes in `IN` lists are not inferred: Example: ```sql SELECT * FROM employees WHERE department_id IN ($1, $2, $3); ``` ### To Reproduce

[PR] infer placeholder datatype for IN lists [datafusion]

2025-04-25 Thread via GitHub
kczimm opened a new pull request, #15864: URL: https://github.com/apache/datafusion/pull/15864 ## Which issue does this PR close? - Closes #15863 ## Rationale for this change Placeholders are currently limited to certain expressions. ## What changes

Re: [PR] feat: use edition 2024 [datafusion-sqlparser-rs]

2025-04-25 Thread via GitHub
github-actions[bot] commented on PR #1736: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736#issuecomment-2831740340 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

Re: [PR] feat: Emit warning with Diagnostic when doing = Null [datafusion]

2025-04-25 Thread via GitHub
changsun20 commented on PR #15696: URL: https://github.com/apache/datafusion/pull/15696#issuecomment-2831631680 > Perhaps accessing fields of structs? eg: > > ```sql > SELECT get_field({'x': null}, 'x') = null; > ``` Thanks for pointing that out, I'll take that into consid

Re: [PR] chore: Make Aggregate transformation more compact [datafusion-comet]

2025-04-25 Thread via GitHub
kazuyukitanimura commented on code in PR #1670: URL: https://github.com/apache/datafusion-comet/pull/1670#discussion_r2061094604 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -430,55 +430,43 @@ class CometSparkSessionExtensions op

Re: [PR] Minor: Interval singleton [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15859: URL: https://github.com/apache/datafusion/pull/15859#discussion_r2061085496 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -286,6 +286,11 @@ impl Interval { } } +/// Create a new `Interval` with the same

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on PR #15851: URL: https://github.com/apache/datafusion/pull/15851#issuecomment-2831689650 > But I want to improve it in an new pr, for not blocking #15591 , is it ok? Sure. > We should ensure all ordering situations can be covered. At least 1 full orderi

[PR] POC: AST -> Merge Logical Plan [datafusion]

2025-04-25 Thread via GitHub
jonathanc-n opened a new pull request, #15862: URL: https://github.com/apache/datafusion/pull/15862 ## Which issue does this PR close? part of #13385 ## Rationale for this change Adds AST -> MERGE logical plan. ## What changes are included in this PR?

Re: [PR] POC: Parse to Merge Logical Plan [datafusion]

2025-04-25 Thread via GitHub
jonathanc-n commented on code in PR #15862: URL: https://github.com/apache/datafusion/pull/15862#discussion_r2061054598 ## datafusion/sql/src/statement.rs: ## @@ -2074,6 +2073,178 @@ impl SqlToRel<'_, S> { Ok(plan) } +fn merge_to_plan( +&self, +

Re: [PR] Add `FormatOptions` to Config [datafusion]

2025-04-25 Thread via GitHub
blaginin commented on PR #15793: URL: https://github.com/apache/datafusion/pull/15793#issuecomment-2831540386 > Once we are able to access the config from udf's it would be interesting to be able to use these options as defaults for things like to_char 💯 -- This is an automated mes

Re: [PR] Add `FormatOptions` to Config [datafusion]

2025-04-25 Thread via GitHub
blaginin commented on PR #15793: URL: https://github.com/apache/datafusion/pull/15793#issuecomment-2831540228 > For example, why are there no defaults for the date/time formats? Or if there are defaults, the comments should point to where there are set and what they are. That is a gr

Re: [PR] Move the udf module to user_defined [datafusion-python]

2025-04-25 Thread via GitHub
timsaucer commented on PR #1112: URL: https://github.com/apache/datafusion-python/pull/1112#issuecomment-2831187602 @crystalxyz FYI this does touch on the documentation you wrote. Please let me know if you think I should change back the formatting. I did need to switch from markdown to rst

Re: [PR] pipe column orderings into pruning predicate creation [datafusion]

2025-04-25 Thread via GitHub
etseidl commented on PR #15821: URL: https://github.com/apache/datafusion/pull/15821#issuecomment-2831506574 @adriangb please check out https://github.com/pydantic/datafusion/pull/28 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059944813 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] chore: match Maven plugin versions with Spark 3.5 [datafusion-comet]

2025-04-25 Thread via GitHub
hsiang-c commented on code in PR #1668: URL: https://github.com/apache/datafusion-comet/pull/1668#discussion_r2053090499 ## pom.xml: ## @@ -47,9 +47,27 @@ under the License. 11 ${java.version} ${java.version} +3.11.0 +3.6.0 +3.5.0 Review Comment: r

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2060734020 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2060731249 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on PR #15851: URL: https://github.com/apache/datafusion/pull/15851#issuecomment-2831226388 @jayzhan211 Maybe we can improve the testing of sorted cases like that? In `data generation` part, we randomly generate the sort keys both number and type, rather than only `

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2060734020 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[PR] Move the udf module to user_defined [datafusion-python]

2025-04-25 Thread via GitHub
timsaucer opened a new pull request, #1112: URL: https://github.com/apache/datafusion-python/pull/1112 # Which issue does this PR close? None # Rationale for this change Right now we have some minor confusion when importing from the `udf` module because at the root of `

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2060734020 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2060731249 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-25 Thread via GitHub
hsiang-c commented on PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#issuecomment-2831140241 @parthchandra Yes, I think I changed the behaviors of both `ParquetDatetimeRebaseV2Suite` and `ParquetReadV2Suite`. Though in the code they are configured with `native_comet` i

Re: [PR] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-25 Thread via GitHub
mbutrovich commented on code in PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647#discussion_r2060627081 ## docs/source/contributor-guide/benchmarking_macos.md: ## @@ -0,0 +1,145 @@ + + +# Comet Benchmarking on macOS + +This guide is for setting up TPC-H benchm

Re: [PR] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-25 Thread via GitHub
mbutrovich commented on code in PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647#discussion_r2060627081 ## docs/source/contributor-guide/benchmarking_macos.md: ## @@ -0,0 +1,145 @@ + + +# Comet Benchmarking on macOS + +This guide is for setting up TPC-H benchm

Re: [PR] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-25 Thread via GitHub
mbutrovich commented on code in PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647#discussion_r2060627081 ## docs/source/contributor-guide/benchmarking_macos.md: ## @@ -0,0 +1,145 @@ + + +# Comet Benchmarking on macOS + +This guide is for setting up TPC-H benchm

Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-25 Thread via GitHub
andygrove closed issue #1648: Investigate unstable benchmark results on macOS URL: https://github.com/apache/datafusion-comet/issues/1648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: add jemalloc as optional custom allocator [datafusion-comet]

2025-04-25 Thread via GitHub
andygrove merged PR #1679: URL: https://github.com/apache/datafusion-comet/pull/1679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Adjust sizeInBytes estimation for Comet exchanges to avoid join strategy regressions [datafusion-comet]

2025-04-25 Thread via GitHub
parthchandra commented on issue #1671: URL: https://github.com/apache/datafusion-comet/issues/1671#issuecomment-2830981387 We can get row count from the scan at planning time and with cbo enabled, the row counts are available at join planning time. -- This is an automated message fr

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059930280 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-25 Thread via GitHub
parthchandra commented on PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#issuecomment-2830836202 Re-running the failed ci (the failures do not seem to be related) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] [Experimental scans] schema adapter does not apply required schema for structs within lists [datafusion-comet]

2025-04-25 Thread via GitHub
comphead commented on issue #1681: URL: https://github.com/apache/datafusion-comet/issues/1681#issuecomment-2830857856 > > > interesting that referring to first column which is `a` returns result correctly, but `b` does not > > > > > > This is because we read all fields in the st

Re: [I] Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc) [datafusion]

2025-04-25 Thread via GitHub
adriangb commented on issue #7955: URL: https://github.com/apache/datafusion/issues/7955#issuecomment-2830822721 > Modify HashJoinExec to build a bloom filter on the build side, and when complete call DynamicFilterPhysicalExpr::update Pretty much: once `HashjoinExec` has completed the

Re: [I] Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc) [datafusion]

2025-04-25 Thread via GitHub
mbutrovich commented on issue #7955: URL: https://github.com/apache/datafusion/issues/7955#issuecomment-2830801221 So now that #15568 is in, what is a reasonable approach to do SIP with bloom filters? - Modify `HashJoinExec` to build a bloom filter on the build side, and when complete ca

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-25 Thread via GitHub
rroelke commented on code in PR #15861: URL: https://github.com/apache/datafusion/pull/15861#discussion_r2060445809 ## datafusion/common/src/error.rs: ## @@ -59,7 +59,7 @@ pub enum DataFusionError { ParquetError(ParquetError), /// Error when reading Avro data. #[c

Re: [I] support FFI query result streams that do not pre-collect [datafusion-python]

2025-04-25 Thread via GitHub
kylebarron commented on issue #1011: URL: https://github.com/apache/datafusion-python/issues/1011#issuecomment-2830779528 FWIW I implemented this in https://github.com/kylebarron/arro3/pull/313, where the `RecordBatchStream` can be synchronously exported to an `ArrowArrayStream` -- This

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-25 Thread via GitHub
rroelke commented on code in PR #15861: URL: https://github.com/apache/datafusion/pull/15861#discussion_r2060445809 ## datafusion/common/src/error.rs: ## @@ -59,7 +59,7 @@ pub enum DataFusionError { ParquetError(ParquetError), /// Error when reading Avro data. #[c

Re: [PR] chore(deps): bump clap from 4.5.36 to 4.5.37 [datafusion]

2025-04-25 Thread via GitHub
comphead merged PR #15853: URL: https://github.com/apache/datafusion/pull/15853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Remove usage of `dbg!` [datafusion]

2025-04-25 Thread via GitHub
comphead commented on code in PR #15858: URL: https://github.com/apache/datafusion/pull/15858#discussion_r2060421347 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1557,7 +1557,7 @@ fn build_predicate_expression( // allow partial failure in predicate expression

Re: [PR] Fix `from_unixtime` function documentation [datafusion]

2025-04-25 Thread via GitHub
comphead merged PR #15844: URL: https://github.com/apache/datafusion/pull/15844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Remove usage of `dbg!` [datafusion]

2025-04-25 Thread via GitHub
comphead merged PR #15858: URL: https://github.com/apache/datafusion/pull/15858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: More details to `No UDF registered` error [datafusion]

2025-04-25 Thread via GitHub
comphead merged PR #15843: URL: https://github.com/apache/datafusion/pull/15843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-25 Thread via GitHub
comphead commented on code in PR #15861: URL: https://github.com/apache/datafusion/pull/15861#discussion_r2060413924 ## datafusion/common/src/error.rs: ## @@ -59,7 +59,7 @@ pub enum DataFusionError { ParquetError(ParquetError), /// Error when reading Avro data. #[

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-25 Thread via GitHub
rroelke commented on PR #15861: URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2830652897 From the guidelines this should also have the "api-change" label but I don't think I have permission to add it. -- This is an automated message from the Apache Git Service. To resp

Re: [I] [Experimental scans] schema adapter does not apply required schema for structs within lists [datafusion-comet]

2025-04-25 Thread via GitHub
andygrove commented on issue #1681: URL: https://github.com/apache/datafusion-comet/issues/1681#issuecomment-2830646895 > > interesting that referring to first column which is `a` returns result correctly, but `b` does not > > This is because we read all fields in the struct, so we a

[PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-25 Thread via GitHub
rroelke opened a new pull request, #15861: URL: https://github.com/apache/datafusion/pull/15861 ## Which issue does this PR close? - Closes #15860. ## Rationale for this change Fixes `clippy::large_enum_variant` which has been enabled by default on nightl

Re: [PR] Minor: Interval singleton [datafusion]

2025-04-25 Thread via GitHub
m09526 commented on code in PR #15859: URL: https://github.com/apache/datafusion/pull/15859#discussion_r2060372212 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -286,6 +286,11 @@ impl Interval { } } +/// Create a new `Interval` with the same low

Re: [I] [Experimental scans] schema adapter does not apply required schema for structs within lists [datafusion-comet]

2025-04-25 Thread via GitHub
andygrove commented on issue #1681: URL: https://github.com/apache/datafusion-comet/issues/1681#issuecomment-2830638274 > interesting that referring to first column which is `a` returns result correctly, but `b` does not This is because we read all fields in the struct, so we are gua

Re: [PR] Minor: Interval singleton [datafusion]

2025-04-25 Thread via GitHub
m09526 commented on code in PR #15859: URL: https://github.com/apache/datafusion/pull/15859#discussion_r2060371205 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -286,6 +286,11 @@ impl Interval { } } +/// Create a new `Interval` with the same low

[I] chore: Rust lint `clippy::large_enum_variant` flags all uses of `Result` with `features = ["avro"]` [datafusion]

2025-04-25 Thread via GitHub
rroelke opened a new issue, #15860: URL: https://github.com/apache/datafusion/issues/15860 Minimal reproducer: Cargo.toml ``` [package] name = "datafusion-bug-repro" edition = "2024" [dependencies] datafusion-common = { version = "47", features = ["avro"] } ```

Re: [PR] Add `FormatOptions` to Config [datafusion]

2025-04-25 Thread via GitHub
Omega359 commented on PR #15793: URL: https://github.com/apache/datafusion/pull/15793#issuecomment-2830453193 Interesting. I'm still processing what the options imply. For example, why are there no defaults for the date/time formats? Or if there are defaults, the comments should point to wh

Re: [PR] Fix ScalarValue::List comparison when the compared lists have different lengths [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on PR #15856: URL: https://github.com/apache/datafusion/pull/15856#issuecomment-2830313295 Thanks @gabotechs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] feat: alias with metadata [datafusion-python]

2025-04-25 Thread via GitHub
timsaucer merged PR #: URL: https://github.com/apache/datafusion-python/pull/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[PR] Minor: Interval singleton [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 opened a new pull request, #15859: URL: https://github.com/apache/datafusion/pull/15859 ## Which issue does this PR close? - Closes #. ## Rationale for this change We don't need additional check for the same value. ## What changes are inc

Re: [PR] Fix ScalarValue::List comparison when the compared lists have different lengths [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 merged PR #15856: URL: https://github.com/apache/datafusion/pull/15856 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

[PR] Remove usage of `dbg!` [datafusion]

2025-04-25 Thread via GitHub
phillipleblanc opened a new pull request, #15858: URL: https://github.com/apache/datafusion/pull/15858 ## Which issue does this PR close? N/A ## Rationale for this change Follow-up for https://github.com/apache/datafusion/pull/15764/files#r2051508562 - removes unexpecte

Re: [I] ListingTable statistics improperly merges statistics when files have different schemas [datafusion]

2025-04-25 Thread via GitHub
xudong963 commented on issue #15689: URL: https://github.com/apache/datafusion/issues/15689#issuecomment-2830174889 > oh [@xudong963](https://github.com/xudong963) if there's any follow up issues, feel free to ping me. I'm off next week and want to dive deeper into statistics Sure, I

Re: [I] ListingTable statistics improperly merges statistics when files have different schemas [datafusion]

2025-04-25 Thread via GitHub
friendlymatthew commented on issue #15689: URL: https://github.com/apache/datafusion/issues/15689#issuecomment-2830170142 oh @xudong963 if there's any follow up issues, feel free to ping me. I'm off next week and want to dive deeper into statistics -- This is an automated message from the

Re: [I] ListingTable statistics improperly merges statistics when files have different schemas [datafusion]

2025-04-25 Thread via GitHub
friendlymatthew commented on issue #15689: URL: https://github.com/apache/datafusion/issues/15689#issuecomment-2830135218 Hey @xudong963 -- I'm currently switching jobs so time's been a bit constrained at the moment. I've only had time to look over this last weekend, but didn't make much pr

[PR] feat: alias with metadata [datafusion-python]

2025-04-25 Thread via GitHub
chenkovsky opened a new pull request, #: URL: https://github.com/apache/datafusion-python/pull/ # Which issue does this PR close? No # Rationale for this change datafusion has supported alias_with_metadata, but python binding doesn't support it yet. # Wha

Re: [I] ListingTable statistics improperly merges statistics when files have different schemas [datafusion]

2025-04-25 Thread via GitHub
xudong963 commented on issue #15689: URL: https://github.com/apache/datafusion/issues/15689#issuecomment-2830098107 Fyi, I'm cooking a PR to fix the issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] `select count(distinct ..)` query doesn't go to the specialized distinct accumulator [datafusion]

2025-04-25 Thread via GitHub
chenkovsky commented on issue #15850: URL: https://github.com/apache/datafusion/issues/15850#issuecomment-2830064806 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add CatalogProvider API [datafusion-python]

2025-04-25 Thread via GitHub
tespent commented on issue #1103: URL: https://github.com/apache/datafusion-python/issues/1103#issuecomment-2830018437 > if you can share, I'd like to learn more about the interplay of the 2 systems. @aditanase Sure. I think my basic idea is quite similar to yours. But instead of wr

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in `min` and `max` [datafusion]

2025-04-25 Thread via GitHub
gabotechs commented on PR #13991: URL: https://github.com/apache/datafusion/pull/13991#issuecomment-2829991915 Work resumed in https://github.com/apache/datafusion/pull/15857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059930280 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Fix ScalarValue::List comparison when the compared lists have different lengths [datafusion]

2025-04-25 Thread via GitHub
gabotechs commented on PR #15856: URL: https://github.com/apache/datafusion/pull/15856#issuecomment-2829978688 cc @jayzhan211 as I think you are familiar with this code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Impl intermeidate result blocked approach sketch [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2829974829 This one is still blocked on following prs: - [ ] #15830 - [ ] #15851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] Support lists in min max [datafusion]

2025-04-25 Thread via GitHub
gabotechs opened a new pull request, #15857: URL: https://github.com/apache/datafusion/pull/15857 ## Which issue does this PR close? - Closes #13987. ## Rationale for this change Resume the work started by @rluvaton in https://github.com/apache/datafusion

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059944813 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059934294 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[PR] Fix scalar list comparison when the compared lists have different lengths [datafusion]

2025-04-25 Thread via GitHub
gabotechs opened a new pull request, #15856: URL: https://github.com/apache/datafusion/pull/15856 ## Which issue does this PR close? It does not close any specific issue but it's related to #10856 ## Rationale for this change Previously, the function that

[I] `NULL::` can't be encode to substrait [datafusion]

2025-04-25 Thread via GitHub
discord9 opened a new issue, #15855: URL: https://github.com/apache/datafusion/issues/15855 ### Describe the bug as title, `NULL::` can't be encode to substrait, WIP fix in #15854 ### To Reproduce try run this unit test: ```rs #[tokio::test] async fn fold_c

Re: [PR] Added SQL Example for `Aggregate Functions` [datafusion]

2025-04-25 Thread via GitHub
Adez017 commented on PR #15778: URL: https://github.com/apache/datafusion/pull/15778#issuecomment-2829883831 i think the PR is ready to be reviewed and merge @alamb @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] fix: fold cast null to typed null [datafusion]

2025-04-25 Thread via GitHub
discord9 opened a new pull request, #15854: URL: https://github.com/apache/datafusion/pull/15854 ## Which issue does this PR close? TODO(file a issue later) - Closes #. ## Rationale for this change fix a error where `SELECT NULL::DOUBLE` can't be encode to substrait

Re: [PR] Update extending-operators.md [datafusion]

2025-04-25 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2829859562 hey @xudong963 , check it out now . also could you help in the failing check in workflow? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add `FormatOptions` to Config [datafusion]

2025-04-25 Thread via GitHub
blaginin commented on PR #15793: URL: https://github.com/apache/datafusion/pull/15793#issuecomment-2829850084 FYI @Omega359 @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] chore(deps): bump clap from 4.5.36 to 4.5.37 [datafusion]

2025-04-25 Thread via GitHub
dependabot[bot] opened a new pull request, #15853: URL: https://github.com/apache/datafusion/pull/15853 Bumps [clap](https://github.com/clap-rs/clap) from 4.5.36 to 4.5.37. Release notes Sourced from https://github.com/clap-rs/clap/releases";>clap's releases. v4.5.37 [4.5.

Re: [I] How to install ballista python package? [datafusion-ballista]

2025-04-25 Thread via GitHub
milenkovicm commented on issue #1257: URL: https://github.com/apache/datafusion-ballista/issues/1257#issuecomment-2829718909 hey @Wuerike, unfortunately ballista package has not been published yet #1120 due to #1142 . Ideal place we want to be with ballista python is to be an extens

Re: [I] Add support for conversion of in memory tables to protobuf [datafusion-python]

2025-04-25 Thread via GitHub
aditanase commented on issue #898: URL: https://github.com/apache/datafusion-python/issues/898#issuecomment-2829701511 What would be the expectation in that case? A MemTable resides on your heap, while CSV is potentially on disk. Would you expect that the data would be serialized as part o

Re: [I] support FFI query result streams that do not pre-collect [datafusion-python]

2025-04-25 Thread via GitHub
aditanase commented on issue #1011: URL: https://github.com/apache/datafusion-python/issues/1011#issuecomment-2829680622 For what it's worth, you can do this to some degree with duckdb: - https://duckdb.org/docs/stable/clients/python/data_ingestion#directly-accessing-dataframes-and-arrow

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059746892 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [I] Add CatalogProvider API [datafusion-python]

2025-04-25 Thread via GitHub
aditanase commented on issue #1103: URL: https://github.com/apache/datafusion-python/issues/1103#issuecomment-2829650344 @tespent I am very intrigued by how you're using Datafusion and ray.data together - if you can share, I'd like to learn more about the interplay of the 2 systems.

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059742959 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Feat: introduce partition statistics API [datafusion]

2025-04-25 Thread via GitHub
xudong963 commented on PR #15852: URL: https://github.com/apache/datafusion/pull/15852#issuecomment-2829645057 cc @berkaysynnada PTAL, I didn't see any challenges during refactoring, the process is smooth. The tests are failing due to https://github.com/apache/datafusion/issues/15689

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059739701 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059739701 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Feat: introduce partition statistics API [datafusion]

2025-04-25 Thread via GitHub
xudong963 commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2059740385 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -430,6 +430,32 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { Ok(Statistics::new

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059739701 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059737955 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[PR] Feat: introduce partition statistics API [datafusion]

2025-04-25 Thread via GitHub
xudong963 opened a new pull request, #15852: URL: https://github.com/apache/datafusion/pull/15852 ## Which issue does this PR close? - Closes #. ## Rationale for this change Follow up: https://github.com/apache/datafusion/pull/15503/ ## What changes

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059726987 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059731492 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
jayzhan211 commented on code in PR #15851: URL: https://github.com/apache/datafusion/pull/15851#discussion_r2059726987 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/query_builder.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [I] ListingTable statistics improperly merges statistics when files have different schemas [datafusion]

2025-04-25 Thread via GitHub
xudong963 commented on issue #15689: URL: https://github.com/apache/datafusion/issues/15689#issuecomment-2829609951 Hi @friendlymatthew , friendly ping --how's it going? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint commented on PR #15851: URL: https://github.com/apache/datafusion/pull/15851#issuecomment-2829560268 @waynexia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Make aggr fuzzer query builder more configurable [datafusion]

2025-04-25 Thread via GitHub
Rachelint opened a new pull request, #15851: URL: https://github.com/apache/datafusion/pull/15851 ## Which issue does this PR close? - Closes #. ## Rationale for this change When adding fuzzy tests for #15591 , I found there is the requirement to costom the gener

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-25 Thread via GitHub
gabotechs commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2829553268 @Blizzara applied your suggestions. I think scoping expressions and relations into their own modules made a lot of sense. -- This is an automated message from the Apache Git Serv