Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2585131985 Since the logical-types branch can easily diverge from the main branch, even when the sub-tasks are incomplete, would it be better to merge it into the main branch frequently

Re: [I] Move `SanityChecker` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
mnpw commented on issue #14072: URL: https://github.com/apache/datafusion/issues/14072#issuecomment-2585112045 @cj-zhukov Apologies, I started working on this PR without explicitly assigning it to myself. Would appreciate your review on https://github.com/apache/datafusion/pull/14083. --

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
mnpw commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2585101517 @alamb for your consideration. I don't like that `datafusion-physical-optimizer` needs to use `datafusion` crate as a dev-dependency. This was required as `SanityChecker` tests w

[PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
mnpw opened a new pull request, #14083: URL: https://github.com/apache/datafusion/pull/14083 ## Which issue does this PR close? Closes #14072. ## Rationale for this change From #14072 > Historically DataFusion was one (very) large crate datafusion, and

Re: [PR] chore: [comet-parquet-exec] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2585079878 Updated the plns for Spark 3.5 and Spark 4.0. However plan generation for the native_datafusion impl is failing which will not affect the ci, but which needs to be addresse

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911886029 ## datafusion/common/src/dfschema.rs: ## @@ -442,22 +603,24 @@ impl DFSchema { /// Find all fields that match the given name pub fn fields_with_unqu

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911887850 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [I] count distinct on NaN produces incorrect results [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra commented on issue #1238: URL: https://github.com/apache/datafusion-comet/issues/1238#issuecomment-2585075506 I think that we will have to explicitly check if both sides of a floating point comparison are NaN to match Spark behavior. By definition NaN is not equal to NaN, so t

Re: [PR] added "DEFAULT_CLI_FORMAT_OPTIONS" for cli and sqllogic test [datafusion]

2025-01-10 Thread via GitHub
jonahgao commented on PR #14052: URL: https://github.com/apache/datafusion/pull/14052#issuecomment-2585048618 Thanks @jatin510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Functionality of `array_repeat` udf [datafusion]

2025-01-10 Thread via GitHub
jonahgao closed issue #13872: Functionality of `array_repeat` udf URL: https://github.com/apache/datafusion/issues/13872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] added "DEFAULT_CLI_FORMAT_OPTIONS" for cli and sqllogic test [datafusion]

2025-01-10 Thread via GitHub
jonahgao merged PR #14052: URL: https://github.com/apache/datafusion/pull/14052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911868813 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [I] Optimized spill file format [datafusion]

2025-01-10 Thread via GitHub
2010YOUY01 commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2585043689 Although we're currently spilling column-wise record batches, I think this will change to row-wise batches in the future. It would be better to benchmark and optimize spillin

[I] Support `MemoryExec` in proto `try_from_physical_plan` [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 opened a new issue, #14082: URL: https://github.com/apache/datafusion/issues/14082 ### Is your feature request related to a problem or challenge? In `try_from_physical_plan` https://github.com/apache/datafusion/blob/d91a7c0f5b93bfb7061dcb6aa8b78dd31b7273b3/datafusion/proto/

Re: [PR] Add comments to physical optimizer tests [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 merged PR #14075: URL: https://github.com/apache/datafusion/pull/14075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Minor: Move `LimitPushdown` tests to be in the same file as the code [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 merged PR #14076: URL: https://github.com/apache/datafusion/pull/14076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Add comments to physical optimizer tests [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on PR #14075: URL: https://github.com/apache/datafusion/pull/14075#issuecomment-2585039343 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Minor: Make `group_schema` as `PhysicalGroupBy` method [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on PR #14064: URL: https://github.com/apache/datafusion/pull/14064#issuecomment-2585038720 Thanks @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Minor: Make `group_schema` as `PhysicalGroupBy` method [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 merged PR #14064: URL: https://github.com/apache/datafusion/pull/14064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911863259 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911863137 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911863137 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

[PR] Minor: use hashmap for `physical_exprs_contains` and move `PhysicalExprRef` to `physical-expr-common` [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 opened a new pull request, #14081: URL: https://github.com/apache/datafusion/pull/14081 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911814285 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911814285 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-10 Thread via GitHub
timsaucer commented on PR #981: URL: https://github.com/apache/datafusion-python/pull/981#issuecomment-2585006200 Thank you for another great addition @kosiew ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-10 Thread via GitHub
timsaucer merged PR #981: URL: https://github.com/apache/datafusion-python/pull/981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Default to some compression when writing Parquet [datafusion-python]

2025-01-10 Thread via GitHub
timsaucer closed issue #978: Default to some compression when writing Parquet URL: https://github.com/apache/datafusion-python/issues/978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911800158 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] feat: metadata columns [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1911800158 ## datafusion/common/src/dfschema.rs: ## @@ -319,52 +465,56 @@ impl DFSchema { qualifiers.push(qualifier.cloned()); } } -

Re: [PR] fix: make get_valid_types handle TypeSignature::Numeric correctly [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 commented on PR #14060: URL: https://github.com/apache/datafusion/pull/14060#issuecomment-2584988703 Thanks @niebayes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] fix: make get_valid_types handle TypeSignature::Numeric correctly [datafusion]

2025-01-10 Thread via GitHub
jayzhan211 merged PR #14060: URL: https://github.com/apache/datafusion/pull/14060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911777608 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptiv

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911770443 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptiv

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
codecov-commenter commented on PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#issuecomment-2584940287 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1262?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911767513 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptiv

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
comphead commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911745757 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptive

Re: [PR] chore: [comet-parquet-exec] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2584878525 > > @parthchandra could you run `cargo fmt` and `cargo clippy` > > and `make format` .. that is if we want to see tests passing on this PR, which I think they should

Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
comphead commented on code in PR #1262: URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911726694 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with Adaptive

Re: [I] Distinct aggregates return incorrect results [datafusion-comet]

2025-01-10 Thread via GitHub
comphead commented on issue #1260: URL: https://github.com/apache/datafusion-comet/issues/1260#issuecomment-2584865275 that is really good find @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-10 Thread via GitHub
viirya commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1911704555 ## native/core/src/execution/shuffle/codec.rs: ## @@ -0,0 +1,695 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-10 Thread via GitHub
viirya commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1911699224 ## native/core/src/execution/shuffle/codec.rs: ## @@ -0,0 +1,695 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1911623676 ## native/core/benches/shuffle_writer.rs: ## @@ -31,67 +31,54 @@ use std::sync::Arc; use tokio::runtime::Runtime; fn criterion_benchmark(c: &mut C

Re: [PR] chore: [comet-parquet-exec] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2584726893 > @parthchandra could you run `cargo fmt` and `cargo clippy` and `make format` .. that is if we want to see tests passing on this PR, which I think they should now? Or d

Re: [PR] [comet-parquet-exec] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2584713456 @parthchandra could you run `cargo fmt` and `cargo clippy` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] Add support for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new issue, #1267: URL: https://github.com/apache/datafusion-comet/issues/1267 ### What is the problem the feature request solves? Add support for distinct aggregates and enable "distinct" test in CometAggregateSuite ### Describe the potential solution

[I] Comet possibly preventing AQE optimization [datafusion-comet]

2025-01-10 Thread via GitHub
kazuyukitanimura opened a new issue, #1266: URL: https://github.com/apache/datafusion-comet/issues/1266 ### Describe the bug `SPARK-50258: Fix output column order changed issue after AQE optimization` test fails with Comet on because comet plan does not have `AdaptiveSparkPlanExec` i

Re: [PR] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra commented on PR #1265: URL: https://github.com/apache/datafusion-comet/pull/1265#issuecomment-2584607637 @andygrove @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] Unit test fixes, default scan impl to comet_native [datafusion-comet]

2025-01-10 Thread via GitHub
parthchandra opened a new pull request, #1265: URL: https://github.com/apache/datafusion-comet/pull/1265 Notable changes: 1. There are three scan implementations: | Name | Description | Operator

Re: [PR] docs: Update TPC-H benchmark results [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1257: URL: https://github.com/apache/datafusion-comet/pull/1257#issuecomment-2584451268 Moving this to draft now that we know that the count(distinct) in q16 was not actually working correctly due to https://github.com/apache/datafusion-comet/issues/1260 -- Th

Re: [I] build broken due to breaking change in winnow crate [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove closed issue #1263: build broken due to breaking change in winnow crate URL: https://github.com/apache/datafusion-comet/issues/1263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] formatting the AST while preserving the source location information from the original query [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
graup commented on issue #1634: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1634#issuecomment-2584354845 I'm playing around with a different approach for editing parts of the AST that works more like tools like eslint work. The idea is that given accurate source spans, wh

Re: [PR] build: force use of winnow 0.6.22 to fix build [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove closed pull request #1264: build: force use of winnow 0.6.22 to fix build URL: https://github.com/apache/datafusion-comet/pull/1264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] build: force use of winnow 0.6.22 to fix build [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1264: URL: https://github.com/apache/datafusion-comet/pull/1264#issuecomment-2584313818 This doesn't help because the error happens when install cargo machete -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[I] build broken due to breaking change in downstream winnow crate [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new issue, #1263: URL: https://github.com/apache/datafusion-comet/issues/1263 ### Describe the bug See https://github.com/winnow-rs/winnow/issues/689 ### Steps to reproduce _No response_ ### Expected behavior _No response_ ### Addit

Re: [I] Optimized spill file format [datafusion]

2025-01-10 Thread via GitHub
alamb commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2584278401 As a data point, @totoroyyb reports a 100x faster reading of Arrow IPC data without validation on https://github.com/apache/arrow-rs/issues/6933 -- This is an automated message

[PR] build: force use of winnow 0.6.22 to fix build [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new pull request, #1264: URL: https://github.com/apache/datafusion-comet/pull/1264 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1263 ## Rationale for this change ## What changes are included

Re: [I] Distinct aggregates return incorrect results [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on issue #1260: URL: https://github.com/apache/datafusion-comet/issues/1260#issuecomment-2584118507 To support distinct aggregates we will need to add support for ListVector in both native and columnar shuffle to support intermediate aggregate state -- This is an auto

Re: [PR] Minor: Document the rationale for the lack of Cargo.lock [datafusion]

2025-01-10 Thread via GitHub
gatesn commented on code in PR #14071: URL: https://github.com/apache/datafusion/pull/14071#discussion_r1911310930 ## README.md: ## @@ -146,3 +146,27 @@ stable API, we also improve the API over time. As a result, we typically deprecate methods before removing them, according t

Re: [PR] Minor: Document the rationale for the lack of Cargo.lock [datafusion]

2025-01-10 Thread via GitHub
gatesn commented on code in PR #14071: URL: https://github.com/apache/datafusion/pull/14071#discussion_r1911310930 ## README.md: ## @@ -146,3 +146,27 @@ stable API, we also improve the API over time. As a result, we typically deprecate methods before removing them, according t

[I] External Error prefix is repeated multiple times [datafusion]

2025-01-10 Thread via GitHub
gz opened a new issue, #14080: URL: https://github.com/apache/datafusion/issues/14080 ### Describe the bug I'm getting errors in the form of DataFusionError::External(External(External(External(External(External(xyz when throwing them from https://docs.rs/datafusion/latest/dataf

[PR] feat: add support for `LogicalPlan::DML(...)` serde [datafusion]

2025-01-10 Thread via GitHub
milenkovicm opened a new pull request, #14079: URL: https://github.com/apache/datafusion/pull/14079 ## Which issue does this PR close? Closes #13616. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Minor: Document output schema of LogicalPlan::Aggregate and LogicalPl… [datafusion]

2025-01-10 Thread via GitHub
alamb commented on PR #14047: URL: https://github.com/apache/datafusion/pull/14047#issuecomment-2583784597 Thanks you @2010YOUY01 and @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-10 Thread via GitHub
alamb commented on PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#issuecomment-2583783434 > @alamb Here is the follow on PR for the custom encoding to replace Arrow IPC: THank you -- to follow up here I also filed a ticket to consider adding a better spill forma

Re: [PR] Add support for the Snowflake MINUS set operator [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
alamb commented on PR #1652: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1652#issuecomment-2583777409 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2025-01-10 Thread via GitHub
alamb commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2583769175 > FYI - the similar changes for `substrait-validator` attached to [substrait-io/substrait-validator#355](https://github.com/substrait-io/substrait-validator/issues/355) does seem

[PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new pull request, #1262: URL: https://github.com/apache/datafusion-comet/pull/1262 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1260 ## Rationale for this change As a short term fix until we

Re: [I] Extension Types [datafusion]

2025-01-10 Thread via GitHub
paleolimbot commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2583556751 I'm interested in this as well; however, I'm new to DataFusion development and I am not sure I have a handle on what the barriers are here (e.g., Is support for this blocked

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2025-01-10 Thread via GitHub
wackywendell commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2583541476 FYI - the similar changes for `substrait-validator` attached to https://github.com/substrait-io/substrait-validator/issues/355 does seem to have fixed [the docs](https://d

Re: [I] Build is broken in main [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove closed issue #1258: Build is broken in main URL: https://github.com/apache/datafusion-comet/issues/1258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] build: Fix test failure caused by merging conflicting PRs [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove merged PR #1259: URL: https://github.com/apache/datafusion-comet/pull/1259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] feat: Add support for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new pull request, #1261: URL: https://github.com/apache/datafusion-comet/pull/1261 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1260 ## Rationale for this change Bug fix. ## What chan

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
iffyio merged PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
bombsimon commented on PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#issuecomment-2583288583 Sorry, I messed up my rebase and lost some commits and force pushed 🙈 But we'll squash on merge anyway so probably nothing to fix now. -- This is an automated message

Re: [I] Move `SanityChecker` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
alamb commented on issue #14072: URL: https://github.com/apache/datafusion/issues/14072#issuecomment-2583265249 Thanks @cj-zhukov ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] chore: Use Spark's ParquetFilters [datafusion-comet]

2025-01-10 Thread via GitHub
huaxingao closed pull request #492: chore: Use Spark's ParquetFilters URL: https://github.com/apache/datafusion-comet/pull/492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] chore: Use Spark's ParquetFilters [datafusion-comet]

2025-01-10 Thread via GitHub
huaxingao commented on PR #492: URL: https://github.com/apache/datafusion-comet/pull/492#issuecomment-2583273768 I will close this for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] Distinct aggregates return incorrect results [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new issue, #1260: URL: https://github.com/apache/datafusion-comet/issues/1260 ### Describe the bug When translating aggregate expressions to DataFusion we ignore whether the aggregate is distinct or not, resulting in incorrect behavior. The existing tests see

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-10 Thread via GitHub
comphead commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2583243383 > NOTE: One notable thing I want to point out here. find_in_set(str, str_list) doesn't work if str or str_list is Scalar::Utf8View (string view literal), which is true for both t

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
iffyio commented on code in PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#discussion_r1910640466 ## src/parser/mod.rs: ## @@ -12012,10 +12033,26 @@ impl<'a> Parser<'a> { replace_into, priority, inse

Re: [PR] test: show a mismatch for initcap between Spark and DataFusion [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1051: URL: https://github.com/apache/datafusion-comet/pull/1051#issuecomment-2583194044 I tested this PR locally and the test fails. We will likely need to implement a custom version that matches Spark's logic. ``` Spark C

Re: [PR] Correctly look for end delimiter dollar quoted string [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
iffyio commented on code in PR #1650: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1650#discussion_r1910623511 ## src/tokenizer.rs: ## @@ -1554,46 +1554,29 @@ impl<'a> Tokenizer<'a> { if matches!(chars.peek(), Some('$')) && !self.dialect.supports_do

Re: [PR] chore: Use Spark's ParquetFilters [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #492: URL: https://github.com/apache/datafusion-comet/pull/492#issuecomment-2583155468 @huaxingao @parthchandra I'm moving this to draft for now since it seems inactive. Should we close this PR? -- This is an automated message from the Apache Git Service. To resp

Re: [PR] build: Fix test failure caused by merging conflicting PRs [datafusion-comet]

2025-01-10 Thread via GitHub
codecov-commenter commented on PR #1259: URL: https://github.com/apache/datafusion-comet/pull/1259#issuecomment-2583136532 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1259?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: speed up ConcurrentHashMap#computeIfAbsent of JDK8 [datafusion-comet]

2025-01-10 Thread via GitHub
mbutrovich commented on PR #1245: URL: https://github.com/apache/datafusion-comet/pull/1245#issuecomment-2583017470 Thanks for raising this issue. I definitely learned something new about JDK8's performance issue with ConcurrentHashMap, as your microbenchmark demonstrates. However, it's no

[I] Support sort merge join with a join condition [datafusion-comet]

2025-01-10 Thread via GitHub
viirya opened a new issue, #398: URL: https://github.com/apache/datafusion-comet/issues/398 ### What is the problem the feature request solves? Currently SMJ with join condition is not supported by Comet and falls back to Spark. The feature was added into DataFusion but we've not inco

Re: [I] Support sort merge join with a join condition [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on issue #398: URL: https://github.com/apache/datafusion-comet/issues/398#issuecomment-2583009271 Re-opening this issue because we only enable this feature in tests currently due to poor performance in benchmarks -- This is an automated message from the Apache Git Ser

Re: [PR] build(deps): bump com.google.protobuf:protobuf-java from 3.19.6 to 3.25.5 [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #954: URL: https://github.com/apache/datafusion-comet/pull/954#issuecomment-2582978130 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[PR] build: Fix test failure caused by merging conflicting PRs [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new pull request, #1259: URL: https://github.com/apache/datafusion-comet/pull/1259 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Move `JoinSelection` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
cj-zhukov commented on issue #14073: URL: https://github.com/apache/datafusion/issues/14073#issuecomment-2582907973 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Build is broken in main [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on issue #1258: URL: https://github.com/apache/datafusion-comet/issues/1258#issuecomment-2582907784 I am working on a fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] Build is broken in main [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove opened a new issue, #1258: URL: https://github.com/apache/datafusion-comet/issues/1258 ### Describe the bug There was a conflict between some PRs merged recently that has introduced a test failure in CometAggregateSuite ### Steps to reproduce _No response_

Re: [I] Move `SanityChecker` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-10 Thread via GitHub
cj-zhukov commented on issue #14072: URL: https://github.com/apache/datafusion/issues/14072#issuecomment-2582906768 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: rand expression support [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on code in PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#discussion_r1910483116 ## native/core/src/execution/jni_api.rs: ## @@ -317,7 +317,7 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( // query pl

Re: [PR] build: bump spark version to 3.3.4, 3.4.4, 3.5.4 for spark-3.3, spark-3.4 and spark-3.5 [datafusion-comet]

2025-01-10 Thread via GitHub
andygrove commented on PR #1243: URL: https://github.com/apache/datafusion-comet/pull/1243#issuecomment-2582876610 > @andygrove, shoud this pull request introduce the diff files for bumping to the latest version? Yes, we would need those. I would recommend creating one PR per major S

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2025-01-10 Thread via GitHub
alamb commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2582869825 This PR appears to have stalled -- it seems we are not ready to commit to this kind of wrapping in the main DataFusion crate but we also don't have any plausible alternative. T

[I] Optimized spill file format [datafusion]

2025-01-10 Thread via GitHub
alamb opened a new issue, #14078: URL: https://github.com/apache/datafusion/issues/14078 ### Is your feature request related to a problem or challenge? DataFusion spills data to local disk for processing datasets that do not fit in available memory, as illustrated in this comment:

Re: [PR] feat: support `INSERT INTO [TABLE] FUNCTION` of Clickhouse [datafusion-sqlparser-rs]

2025-01-10 Thread via GitHub
iffyio merged PR #1633: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1633 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Update ctor dep to latest [datafusion]

2025-01-10 Thread via GitHub
alamb commented on PR #14070: URL: https://github.com/apache/datafusion/pull/14070#issuecomment-2582824575 Let's keep the files consistent -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Update ctor dep to latest [datafusion]

2025-01-10 Thread via GitHub
alamb merged PR #14070: URL: https://github.com/apache/datafusion/pull/14070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

  1   2   >