Re: [PR] Consolidate csv_opener.rs and json_opener.rs into a single example (#… [datafusion]

2025-01-04 Thread via GitHub
jonahgao merged PR #13981: URL: https://github.com/apache/datafusion/pull/13981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Consolidate csv_opener.rs and json_opener.rs into a single example [datafusion]

2025-01-04 Thread via GitHub
jonahgao closed issue #13955: Consolidate csv_opener.rs and json_opener.rs into a single example URL: https://github.com/apache/datafusion/issues/13955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Simplify error handling in case.rs [datafusion]

2025-01-04 Thread via GitHub
cj-zhukov commented on issue #13990: URL: https://github.com/apache/datafusion/issues/13990#issuecomment-2571510879 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Support fast group accumulator for `first` and `last` [datafusion]

2025-01-04 Thread via GitHub
zhuqi-lucas commented on issue #13998: URL: https://github.com/apache/datafusion/issues/13998#issuecomment-2571489356 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (groupby support) [datafusion]

2025-01-04 Thread via GitHub
zhuqi-lucas commented on PR #13996: URL: https://github.com/apache/datafusion/pull/13996#issuecomment-2571488230 The PR testing result: **Data generate example, we can use small, medium, or big dataset:** ```rust ./benchmarks/bench.sh data h2o_small ***

Re: [I] Add H2O.ai Database-like Ops benchmark to `dfbench` [datafusion]

2025-01-04 Thread via GitHub
zhuqi-lucas commented on issue #7209: URL: https://github.com/apache/datafusion/issues/7209#issuecomment-2571486272 The PR testing result: **Data generate example, we can use small, medium, or big dataset:** ```rust ./benchmarks/bench.sh data h2o_small

Re: [I] Add H2O.ai Database-like Ops benchmark to `dfbench` [datafusion]

2025-01-04 Thread via GitHub
zhuqi-lucas commented on issue #7209: URL: https://github.com/apache/datafusion/issues/7209#issuecomment-2571472217 Sure @alamb @2010YOUY01 , thanks, let me change the PR to only support groupby and add todo for join. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Fail on optimization cycles [datafusion]

2025-01-04 Thread via GitHub
github-actions[bot] commented on PR #11288: URL: https://github.com/apache/datafusion/pull/11288#issuecomment-2571469733 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

[PR] Use workspace rust-version for all workspace crates [datafusion]

2025-01-04 Thread via GitHub
Jefffrey opened a new pull request, #14009: URL: https://github.com/apache/datafusion/pull/14009 ## Which issue does this PR close? Closes #9214 ## Rationale for this change `cargo-msrv` version `0.16.0` includes the upstream fix we needed (though current

Re: [PR] Use workspace rust-version for all workspace crates [datafusion]

2025-01-04 Thread via GitHub
Jefffrey commented on code in PR #14009: URL: https://github.com/apache/datafusion/pull/14009#discussion_r1903177589 ## datafusion-cli/Cargo.toml: ## @@ -25,7 +25,6 @@ keywords = ["arrow", "datafusion", "query", "sql"] license = "Apache-2.0" homepage = "https://datafusion.apac

Re: [I] Add H2O.ai Database-like Ops benchmark to `dfbench` [datafusion]

2025-01-04 Thread via GitHub
alamb commented on issue #7209: URL: https://github.com/apache/datafusion/issues/7209#issuecomment-2571431701 > Looks like it's a todo tracked by https://github.com/mrpowers-io/falsa/issues/21, perhaps we can skip join queries for now I think it is a good idea to skip the join querie

Re: [PR] Add swap_inputs to SMJ [datafusion]

2025-01-04 Thread via GitHub
alamb commented on PR #13984: URL: https://github.com/apache/datafusion/pull/13984#issuecomment-2571429125 > There are no explicit unit tests, but the sisters methods don't have such tests either. These methods are tested indirectly via optimization rules. We can consider whether to add suc

Re: [I] formatting the AST while preserving the source location information from the original query [datafusion-sqlparser-rs]

2025-01-04 Thread via GitHub
lovasoa commented on issue #1634: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1634#issuecomment-2571427562 That could work, but it's not necessarily a lot less work than a Display implementation that respects the Span information, and the result would be wasteful and less

Re: [I] Introduce ProjectionMask To Allow Nested Projection Pushdown [datafusion]

2025-01-04 Thread via GitHub
alamb commented on issue #2581: URL: https://github.com/apache/datafusion/issues/2581#issuecomment-2571424336 > There are also real optimizations available here. For example, suppose I write an Arrow int8 column to Parquet. The Arrow schema is serialized into Parquet metadata so at read tim

Re: [I] formatting the AST while preserving the source location information from the original query [datafusion-sqlparser-rs]

2025-01-04 Thread via GitHub
alamb commented on issue #1634: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1634#issuecomment-2571421058 > I think there is a misunderstanding. My use case is: the user enters a query. I need to parse it, potentially edit the ast, and then stringify it back before feeding

Re: [I] formatting the AST while preserving the source location information from the original query [datafusion-sqlparser-rs]

2025-01-04 Thread via GitHub
lovasoa commented on issue #1634: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1634#issuecomment-2571399681 > Specifically, to preserve the original SQL for error messages you could also keep the actual original SQL string somewhere and on error construct the error message

[PR] build(deps): bump datafusion-substrait from 43.0.0 to 44.0.0 [datafusion-python]

2025-01-04 Thread via GitHub
dependabot[bot] opened a new pull request, #986: URL: https://github.com/apache/datafusion-python/pull/986 Bumps [datafusion-substrait](https://github.com/apache/datafusion) from 43.0.0 to 44.0.0. Commits https://github.com/apache/datafusion/commit/3cc3fca31e6edc2d953e663bfd7f8

[PR] build(deps): bump datafusion-functions-window-common from 43.0.0 to 44.0.0 [datafusion-python]

2025-01-04 Thread via GitHub
dependabot[bot] opened a new pull request, #989: URL: https://github.com/apache/datafusion-python/pull/989 Bumps [datafusion-functions-window-common](https://github.com/apache/datafusion) from 43.0.0 to 44.0.0. Commits https://github.com/apache/datafusion/commit/3cc3fca31e6edc

[PR] build(deps): bump datafusion from 43.0.0 to 44.0.0 [datafusion-python]

2025-01-04 Thread via GitHub
dependabot[bot] opened a new pull request, #985: URL: https://github.com/apache/datafusion-python/pull/985 Bumps [datafusion](https://github.com/apache/datafusion) from 43.0.0 to 44.0.0. Commits https://github.com/apache/datafusion/commit/3cc3fca31e6edc2d953e663bfd7f856bcb70d8c

[PR] build(deps): bump datafusion-proto from 43.0.0 to 44.0.0 [datafusion-python]

2025-01-04 Thread via GitHub
dependabot[bot] opened a new pull request, #990: URL: https://github.com/apache/datafusion-python/pull/990 Bumps [datafusion-proto](https://github.com/apache/datafusion) from 43.0.0 to 44.0.0. Commits https://github.com/apache/datafusion/commit/3cc3fca31e6edc2d953e663bfd7f856bc

[PR] build(deps): bump datafusion-ffi from 43.0.0 to 44.0.0 [datafusion-python]

2025-01-04 Thread via GitHub
dependabot[bot] opened a new pull request, #988: URL: https://github.com/apache/datafusion-python/pull/988 Bumps [datafusion-ffi](https://github.com/apache/datafusion) from 43.0.0 to 44.0.0. Commits https://github.com/apache/datafusion/commit/3cc3fca31e6edc2d953e663bfd7f856bcb7

[PR] build(deps): bump async-trait from 0.1.83 to 0.1.84 [datafusion-python]

2025-01-04 Thread via GitHub
dependabot[bot] opened a new pull request, #987: URL: https://github.com/apache/datafusion-python/pull/987 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.83 to 0.1.84. Release notes Sourced from https://github.com/dtolnay/async-trait/releases";>async-trait's

Re: [I] MySQL Compatibility Issues: mysqldump [datafusion-sqlparser-rs]

2025-01-04 Thread via GitHub
alexkazik commented on issue #302: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/302#issuecomment-2571391596 There is another problem with a mysql dump: ```sql ALTER TABLE `gametype` ADD PRIMARY KEY (`GameTypeID`), ADD KEY `GameTypeName` (`GameTypeName`(20));

Re: [PR] Update release README for datafusion-cli publishing [datafusion]

2025-01-04 Thread via GitHub
alamb commented on PR #13982: URL: https://github.com/apache/datafusion/pull/13982#issuecomment-2571390876 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] formatting the AST while preserving the source location information from the original query [datafusion-sqlparser-rs]

2025-01-04 Thread via GitHub
alamb commented on issue #1634: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1634#issuecomment-2571390419 I don't understand the usecase for preserving the original formatting after `Display` the AST. Specifically, to preserve the original SQL for error messages you

Re: [PR] Consolidate csv_opener.rs and json_opener.rs into a single example (#… [datafusion]

2025-01-04 Thread via GitHub
alamb commented on PR #13981: URL: https://github.com/apache/datafusion/pull/13981#issuecomment-2571378236 > @alamb Andrew, I've noticed some checks have failed. Could you help me understand if this is related to my changes or something else? I'd be happy to address the issue. Hi @cj

Re: [PR] Feat: Add support for `array_size` [datafusion-comet]

2025-01-04 Thread via GitHub
viirya commented on PR #1214: URL: https://github.com/apache/datafusion-comet/pull/1214#issuecomment-2571377061 There is already one PR for array_size support: https://github.com/apache/datafusion-comet/pull/1122 -- This is an automated message from the Apache Git Service. To respond to

Re: [I] Release DataFusion `44.0.0` [datafusion]

2025-01-04 Thread via GitHub
alamb commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2571352381 Filed ticket for the next relese: - https://github.com/apache/datafusion/issues/14008 -- This is an automated message from the Apache Git Service. To respond to the message, p

[I] Release DataFusion `45.0.0` [datafusion]

2025-01-04 Thread via GitHub
alamb opened a new issue, #14008: URL: https://github.com/apache/datafusion/issues/14008 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-04 Thread via GitHub
2010YOUY01 commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1903119766 ## datafusion/core/src/dataframe/mod.rs: ## @@ -2743,6 +2753,110 @@ mod tests { Ok(()) } +// test for https://github.com/apache/datafusion/i

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-04 Thread via GitHub
2010YOUY01 commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1903119766 ## datafusion/core/src/dataframe/mod.rs: ## @@ -2743,6 +2753,110 @@ mod tests { Ok(()) } +// test for https://github.com/apache/datafusion/i

Re: [PR] feat(substrait): introduce consume_rel and consume_expression [datafusion]

2025-01-04 Thread via GitHub
alamb merged PR #13963: URL: https://github.com/apache/datafusion/pull/13963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in `min` and `max` [datafusion]

2025-01-04 Thread via GitHub
alamb commented on code in PR #13991: URL: https://github.com/apache/datafusion/pull/13991#discussion_r1903111345 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -230,7 +235,13 @@ impl AggregateUDFImpl for Max { } fn accumulator(&self, acc_args: AccumulatorAr

Re: [PR] feat(substrait): introduce consume_rel and consume_expression [datafusion]

2025-01-04 Thread via GitHub
alamb commented on PR #13963: URL: https://github.com/apache/datafusion/pull/13963#issuecomment-2571328548 Thanks again @vbarua and @Blizzara -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Optimize CASE expression for "expr or expr" usage. [datafusion]

2025-01-04 Thread via GitHub
alamb merged PR #13953: URL: https://github.com/apache/datafusion/pull/13953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Optimize CASE expression for "expr or expr" usage [datafusion]

2025-01-04 Thread via GitHub
alamb closed issue #11638: Optimize CASE expression for "expr or expr" usage URL: https://github.com/apache/datafusion/issues/11638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Simplify error handling in case.rs [datafusion]

2025-01-04 Thread via GitHub
alamb commented on issue #13990: URL: https://github.com/apache/datafusion/issues/13990#issuecomment-2571325061 I think this is a nice issue for someone who wants to clean up the code and learn how DataFusion works but doesn't require detailed internals understanding -- This is an automat

Re: [PR] Optimize CASE expression for "expr or expr" usage. [datafusion]

2025-01-04 Thread via GitHub
alamb commented on code in PR #13953: URL: https://github.com/apache/datafusion/pull/13953#discussion_r1903110723 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -394,6 +401,43 @@ impl CaseExpr { Ok(ColumnarValue::Array(zip(&when_value, &then_value, &else_)

Re: [PR] FIX : Incorrect NULL handling in BETWEEN expression [datafusion]

2025-01-04 Thread via GitHub
alamb commented on code in PR #14007: URL: https://github.com/apache/datafusion/pull/14007#discussion_r1903110630 ## datafusion/sqllogictest/test_files/between.slt: ## @@ -0,0 +1,32 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: Would it be

Re: [I] datafusion-cli displays error prefix twice [datafusion]

2025-01-04 Thread via GitHub
alamb commented on issue #13979: URL: https://github.com/apache/datafusion/issues/13979#issuecomment-2571324366 @avkirilishin has a very nice PR up to fix this issue: - https://github.com/apache/datafusion/pull/14000 -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Update doc example to remove deprecated DELIMITER option for external tables [datafusion]

2025-01-04 Thread via GitHub
alamb merged PR #14002: URL: https://github.com/apache/datafusion/pull/14002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Chore: update wasm-supported crates [datafusion]

2025-01-04 Thread via GitHub
alamb commented on code in PR #14005: URL: https://github.com/apache/datafusion/pull/14005#discussion_r1903109532 ## datafusion/wasmtest/src/lib.rs: ## @@ -124,4 +132,48 @@ mod test { let task_ctx = session_context.task_ctx(); let _ = collect(physical_plan, tas

Re: [PR] [WIP] fix: unwrapping Err(DataFusionError::Plan) for use in plan_datafusion_err [datafusion]

2025-01-04 Thread via GitHub
avkirilishin closed pull request #14000: [WIP] fix: unwrapping Err(DataFusionError::Plan) for use in plan_datafusion_err URL: https://github.com/apache/datafusion/pull/14000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Parse Postgres's LOCK TABLE statement [datafusion-sqlparser-rs]

2025-01-04 Thread via GitHub
freshtonic commented on PR #1614: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1614#issuecomment-2571280859 @iffyio I've pushed another attempt at this. Munging the Postgres and MySQL versions together into the same `Statement::LockTables { .. }` variant was painful du

Re: [I] Add H2O.ai Database-like Ops benchmark to `dfbench` [datafusion]

2025-01-04 Thread via GitHub
2010YOUY01 commented on issue #7209: URL: https://github.com/apache/datafusion/issues/7209#issuecomment-2571276627 > Hi @alamb , the draft PR works well and tested group by h2o benchmark, but the join seems have some problems, i also submitted the question: > > [MrPowers/mrpowers-benc

Re: [I] formatting the AST while preserving the source location information from the original query [datafusion-sqlparser-rs]

2025-01-04 Thread via GitHub
freshtonic commented on issue #1634: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1634#issuecomment-2571263201 @lovasoa 👋 I'm interested in this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager [datafusion-comet]

2025-01-04 Thread via GitHub
nblagodarnyi commented on issue #864: URL: https://github.com/apache/datafusion-comet/issues/864#issuecomment-2571256859 @ramyadass please carefully read all the comments above. `extraClassPath` should be a local path, not hdfs. -- This is an automated message from the Apache Git Servic

Re: [PR] Improve deserialize_to_struct example [datafusion]

2025-01-04 Thread via GitHub
jonahgao commented on PR #13958: URL: https://github.com/apache/datafusion/pull/13958#issuecomment-2571115244 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] FIX : Incorrect NULL handling in BETWEEN expression [datafusion]

2025-01-04 Thread via GitHub
getChan opened a new pull request, #14007: URL: https://github.com/apache/datafusion/pull/14007 ## Which issue does this PR close? Closes #13976. ## Rationale for this change ## What changes are included in this PR? in `BETWEEN` expression,

Re: [PR] Improve deserialize_to_struct example [datafusion]

2025-01-04 Thread via GitHub
jonahgao merged PR #13958: URL: https://github.com/apache/datafusion/pull/13958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] sql result discrepency with sqlite, postgres and duckdb bug #2 [datafusion]

2025-01-04 Thread via GitHub
aweltsch commented on issue #13782: URL: https://github.com/apache/datafusion/issues/13782#issuecomment-2570816352 Thanks for your response @Omega359, I agree that the unexpected result is calculated due to calculations using the `REAL` type. For me the main surprise was that `nullif(1, 2

[PR] [Minor] Refac: make ArraySort public for broader access [datafusion]

2025-01-04 Thread via GitHub
dharanad opened a new pull request, #14006: URL: https://github.com/apache/datafusion/pull/14006 Changes the visibility of the ArraySort struct fromsuper to public. allows broader access to the struct, enabling its use in other modules and promoting better code reuse. ## Which is

[PR] Feat: Add support for `sort_array` `array_size` & `array_prepend` [datafusion-comet]

2025-01-04 Thread via GitHub
dharanad opened a new pull request, #1214: URL: https://github.com/apache/datafusion-comet/pull/1214 ## Which issue does this PR close? Part of #1042 ## Rationale for this change ## What changes are included in this PR? ## How are these chan