Re: [I] Move `SanityChecker` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
mnpw commented on issue #14072: URL: https://github.com/apache/datafusion/issues/14072#issuecomment-2585112045 @cj-zhukov Apologies, I started working on this PR without explicitly assigning it to myself. Would appreciate your review on https://github.com/apache/datafusion/pull/14083. --

Re: [I] build broken due to breaking change in winnow crate [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove closed issue #1263: build broken due to breaking change in winnow crate URL: https://github.com/apache/datafusion-comet/issues/1263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] chore: [comet-parquet-exec] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2584878525 > > @parthchandra could you run `cargo fmt` and `cargo clippy` > > and `make format` .. that is if we want to see tests passing on this PR, which I think they should

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
comphead commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911745757 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptive

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911767513 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptiv

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
codecov-commenter commented on PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#issuecomment-2584940287 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1262?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Minor: Make `group_schema` as `PhysicalGroupBy` method [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 merged PR #14064: URL: https://github.com/apache/datafusion/pull/14064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Minor: Make `group_schema` as `PhysicalGroupBy` method [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on PR #14064: URL: https://github.com/apache/datafusion/pull/14064#issuecomment-2585038720 Thanks @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add comments to physical optimizer tests [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on PR #14075: URL: https://github.com/apache/datafusion/pull/14075#issuecomment-2585039343 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[I] Support `MemoryExec` in proto `try_from_physical_plan` [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 opened a new issue, #14082: URL: https://github.com/apache/datafusion/issues/14082 ### Is your feature request related to a problem or challenge? In `try_from_physical_plan` https://github.com/apache/datafusion/blob/d91a7c0f5b93bfb7061dcb6aa8b78dd31b7273b3/datafusion/proto/

Re: [PR] Add comments to physical optimizer tests [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 merged PR #14075: URL: https://github.com/apache/datafusion/pull/14075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Minor: Move `LimitPushdown` tests to be in the same file as the code [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 merged PR #14076: URL: https://github.com/apache/datafusion/pull/14076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Optimized spill file format [datafusion]

2025-01-10 Thread via GitHub
2010YOUY01 commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2585043689 Although we're currently spilling column-wise record batches, I think this will change to row-wise batches in the future. It would be better to benchmark and optimize spillin

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911868813 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] added "DEFAULT_CLI_FORMAT_OPTIONS" for cli and sqllogic test [datafusion]

2025-01-10 Thread via GitHub
jonahgao merged PR #14052: URL: https://github.com/apache/datafusion/pull/14052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] added "DEFAULT_CLI_FORMAT_OPTIONS" for cli and sqllogic test [datafusion]

2025-01-10 Thread via GitHub
jonahgao commented on PR #14052: URL: https://github.com/apache/datafusion/pull/14052#issuecomment-2585048618 Thanks @jatin510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Functionality of `array_repeat` udf [datafusion]

2025-01-10 Thread via GitHub
jonahgao closed issue #13872: Functionality of `array_repeat` udf URL: https://github.com/apache/datafusion/issues/13872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] count distinct on NaN produces incorrect results [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra commented on issue #1238: URL: https://github.com/apache/datafusion-comet/issues/1238#issuecomment-2585075506 I think that we will have to explicitly check if both sides of a floating point comparison are NaN to match Spark behavior. By definition NaN is not equal to NaN, so t

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911887850 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911886029 ## datafusion/common/src/dfschema.rs: ## @@ -442,22 +603,24 @@ impl DFSchema { /// Find all fields that match the given name pub fn fields_with_unqu

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-10 Thread via GitHub
viirya commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1911699224 ## native/core/src/execution/shuffle/codec.rs: ## @@ -0,0 +1,695 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-10 Thread via GitHub
viirya commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1911704555 ## native/core/src/execution/shuffle/codec.rs: ## @@ -0,0 +1,695 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
comphead commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911726694 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptive

Re: [I] Distinct aggregates return incorrect results [datafusion-comet]

2025-01-10 Thread via GitHub
comphead commented on issue #1260: URL: https://github.com/apache/datafusion-comet/issues/1260#issuecomment-2584865275 that is really good find @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra opened a new pull request, #1265: URL: https://github.com/apache/datafusion-comet/pull/1265 Notable changes: 1. There are three scan implementations: | Name | Description | Operator

Re: [PR] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2584607637 @andygrove @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [comet-parquet-exec] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2584713456 @parthchandra could you run `cargo fmt` and `cargo clippy` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] chore: [comet-parquet-exec] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2584726893 > @parthchandra could you run `cargo fmt` and `cargo clippy` and `make format` .. that is if we want to see tests passing on this PR, which I think they should now? Or d

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1911623676 ## native/core/benches/shuffle_writer.rs: ## @@ -31,67 +31,54 @@ use std::sync::Arc; use tokio::runtime::Runtime; fn criterion_benchmark(c: &mut C

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911770443 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptiv

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911814285 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911814285 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

[PR] Minor: use hashmap for `physical_exprs_contains` and move `PhysicalExprRef` to `physical-expr-common` [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 opened a new pull request, #14081: URL: https://github.com/apache/datafusion/pull/14081 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] fix: make get_valid_types handle TypeSignature::Numeric correctly [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on PR #14060: URL: https://github.com/apache/datafusion/pull/14060#issuecomment-2584988703 Thanks @niebayes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] fix: make get_valid_types handle TypeSignature::Numeric correctly [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 merged PR #14060: URL: https://github.com/apache/datafusion/pull/14060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911800158 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911800158 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [I] Default to some compression when writing Parquet [datafusion-python]

2025-01-10 Thread via GitHub
timsaucer closed issue #978: Default to some compression when writing Parquet URL: https://github.com/apache/datafusion-python/issues/978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-10 Thread via GitHub
timsaucer commented on PR #981: URL: https://github.com/apache/datafusion-python/pull/981#issuecomment-2585006200 Thank you for another great addition @kosiew ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-10 Thread via GitHub
timsaucer merged PR #981: URL: https://github.com/apache/datafusion-python/pull/981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] formatting the AST while preserving the source location information from the original query [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
graup commented on issue #1634: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1634#issuecomment-2584354845 I'm playing around with a different approach for editing parts of the AST that works more like tools like eslint work. The idea is that given accurate source spans, wh

Re: [PR] chore: [comet-parquet-exec] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2585079878 Updated the plns for Spark 3.5 and Spark 4.0. However plan generation for the native_datafusion impl is failing which will not affect the ci, but which needs to be addresse

[PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
mnpw opened a new pull request, #14083: URL: https://github.com/apache/datafusion/pull/14083 ## Which issue does this PR close? Closes #14072. ## Rationale for this change From #14072 > Historically DataFusion was one (very) large crate datafusion, and

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
mnpw commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2585101517 @alamb for your consideration. I don't like that `datafusion-physical-optimizer` needs to use `datafusion` crate as a dev-dependency. This was required as `SanityChecker` tests w

Re: [PR] Fix clippy for Rust 1.84 [datafusion]

2025-01-10 Thread via GitHub
berkaysynnada merged PR #14065: URL: https://github.com/apache/datafusion/pull/14065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Fix build issues on latest stable Rust toolchain (1.84) [datafusion]

2025-01-10 Thread via GitHub
berkaysynnada closed issue #14061: Fix build issues on latest stable Rust toolchain (1.84) URL: https://github.com/apache/datafusion/issues/14061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix clippy for Rust 1.84 [datafusion]

2025-01-10 Thread via GitHub
berkaysynnada commented on PR #14065: URL: https://github.com/apache/datafusion/pull/14065#issuecomment-2582075524 @jonahgao does linter pass in your local after this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Handle null `IntervalCompound` from substrait [datafusion]

2025-01-10 Thread via GitHub
cht42 closed issue #14066: Handle null `IntervalCompound` from substrait URL: https://github.com/apache/datafusion/issues/14066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] handle null `IntervalCompound` from substrait [datafusion]

2025-01-10 Thread via GitHub
cht42 closed pull request #14067: handle null `IntervalCompound` from substrait URL: https://github.com/apache/datafusion/pull/14067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Fix clippy for Rust 1.84 [datafusion]

2025-01-10 Thread via GitHub
jonahgao commented on PR #14065: URL: https://github.com/apache/datafusion/pull/14065#issuecomment-2582092984 > @jonahgao does linter pass in your local after this change? Yes, it passed. Do you encounter any issues? -- This is an automated message from the Apache Git Service. To re

Re: [PR] test: Add plan execution during tests for bounded source [datafusion]

2025-01-10 Thread via GitHub
berkaysynnada merged PR #14013: URL: https://github.com/apache/datafusion/pull/14013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Execute plans during plan tests to make sure plans are valid and executable. [datafusion]

2025-01-10 Thread via GitHub
berkaysynnada closed issue #8230: Execute plans during plan tests to make sure plans are valid and executable. URL: https://github.com/apache/datafusion/issues/8230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] test: Add plan execution during tests for bounded source [datafusion]

2025-01-10 Thread via GitHub
berkaysynnada commented on PR #14013: URL: https://github.com/apache/datafusion/pull/14013#issuecomment-2582295394 Thank you @avkirilishin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[PR] Fix clippy for Rust 1.84 [datafusion]

2025-01-10 Thread via GitHub
jonahgao opened a new pull request, #14065: URL: https://github.com/apache/datafusion/pull/14065 ## Which issue does this PR close? Closes #14061. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] Fix build issues on latest stable Rust toolchain (1.84) [datafusion]

2025-01-10 Thread via GitHub
jonahgao commented on issue #14061: URL: https://github.com/apache/datafusion/issues/14061#issuecomment-2582008020 Since it blocks many PRs, I created #14065 to fix it. cc @niebayes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
bombsimon commented on code in PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#discussion_r1909983745 ## src/ast/query.rs: ## @@ -2465,14 +2465,25 @@ impl fmt::Display for GroupByExpr { #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]

Re: [PR] chore: Add config for enabling SMJ with join condition [datafusion-comet]

2025-01-10 Thread via GitHub
kazuyukitanimura commented on code in PR #937: URL: https://github.com/apache/datafusion-comet/pull/937#discussion_r1910067146 ## spark/src/test/scala/org/apache/spark/sql/comet/CometPlanStabilitySuite.scala: ## @@ -262,6 +262,7 @@ trait CometPlanStabilitySuite extends DisableA

Re: [PR] chore: Add config for enabling SMJ with join condition [datafusion-comet]

2025-01-10 Thread via GitHub
kazuyukitanimura commented on code in PR #937: URL: https://github.com/apache/datafusion-comet/pull/937#discussion_r1910089857 ## spark/src/test/scala/org/apache/spark/sql/comet/CometPlanStabilitySuite.scala: ## @@ -262,6 +262,7 @@ trait CometPlanStabilitySuite extends DisableA

[PR] Bump `wasm-bindgen` and `wasm-bindgen-futures` [datafusion]

2025-01-10 Thread via GitHub
mbrobbel opened a new pull request, #14068: URL: https://github.com/apache/datafusion/pull/14068 ## Which issue does this PR close? Prevent https://github.com/apache/datafusion/pull/14065#issuecomment-2582412339. ## Rationale for this change Bumps `wasm-bindgen-*` patch

Re: [PR] Fix clippy for Rust 1.84 [datafusion]

2025-01-10 Thread via GitHub
berkaysynnada commented on PR #14065: URL: https://github.com/apache/datafusion/pull/14065#issuecomment-2582438534 > Can you try again after `cargo update wasm-bindgen`? This was fixed in [rustwasm/wasm-bindgen#4284](https://github.com/rustwasm/wasm-bindgen/pull/4284). That works, tha

Re: [PR] Fix clippy for Rust 1.84 [datafusion]

2025-01-10 Thread via GitHub
mbrobbel commented on PR #14065: URL: https://github.com/apache/datafusion/pull/14065#issuecomment-2582467124 > > Can you try again after `cargo update wasm-bindgen`? This was fixed in [rustwasm/wasm-bindgen#4284](https://github.com/rustwasm/wasm-bindgen/pull/4284). > > That works, th

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911777608 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptiv

[I] Add support for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new issue, #1267: URL: https://github.com/apache/datafusion-comet/issues/1267 ### What is the problem the feature request solves? Add support for distinct aggregates and enable "distinct" test in CometAggregateSuite ### Describe the potential solution

Re: [PR] docs: Update TPC-H benchmark results [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1257: URL: https://github.com/apache/datafusion-comet/pull/1257#issuecomment-2584451268 Moving this to draft now that we know that the count(distinct) in q16 was not actually working correctly due to https://github.com/apache/datafusion-comet/issues/1260 -- Th

[I] Comet possibly preventing AQE optimization [datafusion-comet]

2025-01-10 Thread via GitHub
kazuyukitanimura opened a new issue, #1266: URL: https://github.com/apache/datafusion-comet/issues/1266 ### Describe the bug `SPARK-50258: Fix output column order changed issue after AQE optimization` test fails with Comet on because comet plan does not have `AdaptiveSparkPlanExec` i

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911863137 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911863137 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911863259 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2585131985 Since the logical-types branch can easily diverge from the main branch, even when the sub-tasks are incomplete, would it be better to merge it into the main branch frequently

Re: [PR] Fix clippy for Rust 1.84 [datafusion]

2025-01-10 Thread via GitHub
alamb commented on code in PR #14065: URL: https://github.com/apache/datafusion/pull/14065#discussion_r1910263895 ## datafusion/common/src/pyarrow.rs: ## @@ -138,6 +138,9 @@ mod tests { fn test_py_scalar() { init_python(); +// TODO: remove this attribute

Re: [PR] Bump `ctor` to `0.2.9` [datafusion]

2025-01-10 Thread via GitHub
alamb merged PR #14069: URL: https://github.com/apache/datafusion/pull/14069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Create test pattern for Spans [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
graup commented on issue #1563: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1563#issuecomment-2582492517 Hi! Just a small note since I started playing around with this. I assume it's because it's not implemented yet but anyway to keep track: it seems currently that Funct

[PR] Bump `ctor` to `0.2.9` [datafusion]

2025-01-10 Thread via GitHub
mbrobbel opened a new pull request, #14069: URL: https://github.com/apache/datafusion/pull/14069 Fixes https://github.com/apache/datafusion/pull/14065#issuecomment-2582438534. Maybe we should just add a `Cargo.lock` file to have checks in CI to prevent this in the future, following s

[PR] Update ctor rep to latest [datafusion]

2025-01-10 Thread via GitHub
berkaysynnada opened a new pull request, #14070: URL: https://github.com/apache/datafusion/pull/14070 ## Which issue does this PR close? Closes #. ## Rationale for this change I have faced with ``` Checking datafusion-expr v44.0.0 (/Users/berkaysahi

Re: [I] Add optimizer rule to replace inlist with `or` chain for small list [datafusion]

2025-01-10 Thread via GitHub
alamb closed issue #799: Add optimizer rule to replace inlist with `or` chain for small list URL: https://github.com/apache/datafusion/issues/799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Add optimizer rule to replace inlist with `or` chain for small list [datafusion]

2025-01-10 Thread via GitHub
alamb commented on issue #799: URL: https://github.com/apache/datafusion/issues/799#issuecomment-2582597548 This is done now from what I can tell: ```sql > create table foo(x int, a int, b int) as values (1,2,3); 0 row(s) fetched. Elapsed 0.006 seconds. > explain s

Re: [I] Question: is the combination of limit and predicate push-down safe in ParquetExec? [datafusion]

2025-01-10 Thread via GitHub
alamb closed issue #900: Question: is the combination of limit and predicate push-down safe in ParquetExec? URL: https://github.com/apache/datafusion/issues/900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Question: is the combination of limit and predicate push-down safe in ParquetExec? [datafusion]

2025-01-10 Thread via GitHub
alamb commented on issue #900: URL: https://github.com/apache/datafusion/issues/900#issuecomment-2582599182 I believe this is a dupe of the recently fixed issue: - https://github.com/apache/datafusion/issues/13745 -- This is an automated message from the Apache Git Service. To respond t

Re: [I] [Rust] DataFrame.collect should return RecordBatchReader [datafusion]

2025-01-10 Thread via GitHub
alamb closed issue #97: [Rust] DataFrame.collect should return RecordBatchReader URL: https://github.com/apache/datafusion/issues/97 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Track memory usage for each individual operator [datafusion]

2025-01-10 Thread via GitHub
alamb commented on issue #899: URL: https://github.com/apache/datafusion/issues/899#issuecomment-2582600235 This is now handled with the https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/trait.MemoryPool.html -- This is an automated message from the Apache Git Service. To

Re: [I] Track memory usage for each individual operator [datafusion]

2025-01-10 Thread via GitHub
alamb closed issue #899: Track memory usage for each individual operator URL: https://github.com/apache/datafusion/issues/899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] [EPIC] Improved Externalized / Spilling / Large than Memory Hash Aggregation [datafusion]

2025-01-10 Thread via GitHub
alamb commented on issue #13123: URL: https://github.com/apache/datafusion/issues/13123#issuecomment-2582773039 Here is a PR to optimize the spill format: - https://github.com/apache/datafusion/issues/14078 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-10 Thread via GitHub
alamb commented on PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#issuecomment-2582775122 FYI I filed a ticket in DataFusion to consider adding this code (or something similar) upstream: - https://github.com/apache/datafusion/issues/14078 I think it would help

Re: [I] Regression: `DataFrame::schema` returns incorrect schema for NATURAL JOIN [datafusion]

2025-01-10 Thread via GitHub
jonahgao commented on issue #14058: URL: https://github.com/apache/datafusion/issues/14058#issuecomment-2582776727 I'll try to fix this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-10 Thread via GitHub
alamb merged PR #14038: URL: https://github.com/apache/datafusion/pull/14038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-10 Thread via GitHub
alamb commented on PR #14038: URL: https://github.com/apache/datafusion/pull/14038#issuecomment-2582777452 Thanks again @berkaysynnada -- I merged up from main and the CI checks now pass so merging it in -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Bump `wasm-bindgen-*` crates [datafusion]

2025-01-10 Thread via GitHub
jonahgao merged PR #14068: URL: https://github.com/apache/datafusion/pull/14068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Bump `wasm-bindgen-*` crates [datafusion]

2025-01-10 Thread via GitHub
jonahgao commented on PR #14068: URL: https://github.com/apache/datafusion/pull/14068#issuecomment-2582789511 Thanks @mbrobbel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] feat: rand expression support [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on code in PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#discussion_r1910483116 ## native/core/src/execution/jni_api.rs: ## @@ -317,7 +317,7 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( // query pl

Re: [I] Move `SanityChecker` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
cj-zhukov commented on issue #14072: URL: https://github.com/apache/datafusion/issues/14072#issuecomment-2582906768 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Build is broken in main [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on issue #1258: URL: https://github.com/apache/datafusion-comet/issues/1258#issuecomment-2582907784 I am working on a fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Move `JoinSelection` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
cj-zhukov commented on issue #14073: URL: https://github.com/apache/datafusion/issues/14073#issuecomment-2582907973 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Build is broken in main [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new issue, #1258: URL: https://github.com/apache/datafusion-comet/issues/1258 ### Describe the bug There was a conflict between some PRs merged recently that has introduced a test failure in CometAggregateSuite ### Steps to reproduce _No response_

Re: [PR] Bump `ctor` to `0.2.9` [datafusion]

2025-01-10 Thread via GitHub
mbrobbel commented on PR #14069: URL: https://github.com/apache/datafusion/pull/14069#issuecomment-2582533271 > I think the reason we don't have a Cargo.lock is that datafusion is meant to be used as a libary and thus we wanted to give downstream crates the flexibility for most dependent li

Re: [PR] Bump `ctor` to `0.2.9` [datafusion]

2025-01-10 Thread via GitHub
alamb commented on PR #14069: URL: https://github.com/apache/datafusion/pull/14069#issuecomment-2582555840 > > I think the reason we don't have a Cargo.lock is that datafusion is meant to be used as a libary and thus we wanted to give downstream crates the flexibility for most dependent lib

Re: [PR] Minor: Document the rationale for the lack of Cargo.lock [datafusion]

2025-01-10 Thread via GitHub
alamb commented on code in PR #14071: URL: https://github.com/apache/datafusion/pull/14071#discussion_r1910286465 ## README.md: ## @@ -146,3 +146,27 @@ stable API, we also improve the API over time. As a result, we typically deprecate methods before removing them, according to

Re: [PR] Minor: Document the rationale for the lack of Cargo.lock [datafusion]

2025-01-10 Thread via GitHub
alamb commented on code in PR #14071: URL: https://github.com/apache/datafusion/pull/14071#discussion_r1910289206 ## README.md: ## @@ -146,3 +146,27 @@ stable API, we also improve the API over time. As a result, we typically deprecate methods before removing them, according to

[PR] Add a sum statistic [datafusion]

2025-01-10 Thread via GitHub
gatesn opened a new pull request, #14074: URL: https://github.com/apache/datafusion/pull/14074 ## Which issue does this PR close? This PR adds a sum statistic to DataFusion. Future use will include optimizing aggregation functions (sum, avg, count), see https://github.com/apac

Re: [I] [Rust] DataFrame.collect should return RecordBatchReader [datafusion]

2025-01-10 Thread via GitHub
alamb commented on issue #97: URL: https://github.com/apache/datafusion/issues/97#issuecomment-2582600917 Seems like we are done here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[I] Move `SanityChecker` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
alamb opened a new issue, #14072: URL: https://github.com/apache/datafusion/issues/14072 ### Is your feature request related to a problem or challenge? - Part of https://github.com/apache/datafusion/issues/11502 - Related to https://github.com/apache/datafusion/issues/13814 H

  1   2   >