Re: [PR] Add design process section to the docs [datafusion]

2025-06-16 Thread via GitHub
alamb commented on code in PR #16397: URL: https://github.com/apache/datafusion/pull/16397#discussion_r2149689081 ## docs/source/contributor-guide/index.md: ## @@ -108,6 +108,26 @@ Features above) prior to acceptance include: [extensions list]: ../library-user-guide/extensions.

Re: [I] Blog post about parquet vs custom file formats [datafusion]

2025-06-16 Thread via GitHub
alamb commented on issue #16149: URL: https://github.com/apache/datafusion/issues/16149#issuecomment-2976129703 > I am also curious: **Why would uncompressed Parquet be considered an optimization over Snappy-compressed Parquet?** Is the decompression overhead of Snappy significant enough to

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-06-16 Thread via GitHub
alamb commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-2976153704 > Hi [@alamb](https://github.com/alamb), I would like to take up on this. I am new to the project, but this seems like a good way to start. It would certainly be nice, thoug

Re: [I] Support reading multiple parquet files via `datafusion-cli` [datafusion]

2025-06-16 Thread via GitHub
alamb commented on issue #16303: URL: https://github.com/apache/datafusion/issues/16303#issuecomment-2976163501 > Regarding the location of the code, if it is in datafusion proper rather than the CLI, it would be available in datafusion python, and any other projects that want to offer func

Re: [PR] Add note in upgrade guide about changes to `Expr::Scalar` in 48.0.0 [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16360: URL: https://github.com/apache/datafusion/pull/16360#issuecomment-2976166340 Thanks again @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Add note in upgrade guide about changes to `Expr::Scalar` in 48.0.0 [datafusion]

2025-06-16 Thread via GitHub
alamb merged PR #16360: URL: https://github.com/apache/datafusion/pull/16360 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Use dedicated NullEquality enum instead of null_equals_null boolean [datafusion]

2025-06-16 Thread via GitHub
alamb commented on code in PR #16419: URL: https://github.com/apache/datafusion/pull/16419#discussion_r2149709930 ## datafusion/core/tests/execution/infinite_cancel.rs: ## @@ -467,7 +467,7 @@ async fn test_infinite_join_cancel( &JoinType::Inner, None,

Re: [I] [DISCUSSION] JOIN "task force" / project team [datafusion]

2025-06-16 Thread via GitHub
alamb commented on issue #15885: URL: https://github.com/apache/datafusion/issues/15885#issuecomment-2976179273 > FYI, I just followed the latest paper from TUM, "Improving Unnesting of Complex Queries", and will learn about the current code in DF and read the latest PRs in DF about unnesti

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2976183640 > This is for me the result of not coalescing larger batches: Nice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2976187450 > Sounds reasonable. Perhaps just keep it this way and see if we somehow can separate it to 2 different passes later. > > > > It seems like the branch https://gith

[PR] Use dedicated NullEquality enum instead of null_equals_null boolean [datafusion]

2025-06-16 Thread via GitHub
tobixdev opened a new pull request, #16419: URL: https://github.com/apache/datafusion/pull/16419 ## Which issue does this PR close? This PR is the first step mentioned in #15891 that enables joins for Graph-Like Data (e.g., SPARQL). This replaces the `null_equals_null` boolean with

Re: [I] Optimize `NestedLoopJoinExec` Memory Usage [datafusion]

2025-06-16 Thread via GitHub
UBarney commented on issue #16364: URL: https://github.com/apache/datafusion/issues/16364#issuecomment-2975520489 > By joining only one left row with the right batch at a time? However when `right_side_ordered == True` we need maintains right_side order. https://github.com/apach

Re: [I] [DISCUSSION] JOIN "task force" / project team [datafusion]

2025-06-16 Thread via GitHub
xudong963 commented on issue #15885: URL: https://github.com/apache/datafusion/issues/15885#issuecomment-2976206269 > Since I think DataBend is open source https://github.com/databendlabs/databend perhaps you can point us at the relevant code in that codebase that would be a good source of

Re: [I] Interuptable queries in jupyter notebooks [datafusion-python]

2025-06-16 Thread via GitHub
timsaucer closed issue #1136: Interuptable queries in jupyter notebooks URL: https://github.com/apache/datafusion-python/issues/1136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] chore: use DF scalar functions for StartsWith, EndsWith, Contains, DF LikeExpr [datafusion-comet]

2025-06-16 Thread via GitHub
mbutrovich commented on PR #1887: URL: https://github.com/apache/datafusion-comet/pull/1887#issuecomment-2976285522 There's one Spark SQL test that isn't binding attribute references correctly. I'll try to figure that out and maybe I'll get lucky and it'll fix the TPC-H correctness failure

Re: [PR] Add Interruptible Query Execution in Jupyter via KeyboardInterrupt Support [datafusion-python]

2025-06-16 Thread via GitHub
timsaucer merged PR #1141: URL: https://github.com/apache/datafusion-python/pull/1141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Update PMC management instructions to follow new ASF process [datafusion]

2025-06-16 Thread via GitHub
xudong963 merged PR #16417: URL: https://github.com/apache/datafusion/pull/16417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-16 Thread via GitHub
alamb commented on PR #74: URL: https://github.com/apache/datafusion-site/pull/74#issuecomment-2975871112 Thanks again @timsaucer @akurmustafa and @kevinjqliu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] chore(deps): bump rust_decimal from 1.37.1 to 1.37.2 [datafusion]

2025-06-16 Thread via GitHub
dependabot[bot] opened a new pull request, #16422: URL: https://github.com/apache/datafusion/pull/16422 Bumps [rust_decimal](https://github.com/paupino/rust-decimal) from 1.37.1 to 1.37.2. Release notes Sourced from https://github.com/paupino/rust-decimal/releases";>rust_decimal's

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-16 Thread via GitHub
alamb merged PR #74: URL: https://github.com/apache/datafusion-site/pull/74 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2975351398 > > I think two different phases sounds good. > > I don't really like complicating FilterPushdown with post / pre "phases". Why not creating the `PushDownDynamicFilters` one you

[PR] Add license header check to CI [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
alamb opened a new pull request, #1888: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1888 I [hit a small snag creating a 0.57.0 release ](https://github.com/apache/datafusion-sqlparser-rs/issues/1837#issuecomment-2976015123) due to a file not having the required apache licens

Re: [I] Release sqlparser-rs version `0.57.0` around 2024-06-15 [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
alamb commented on issue #1837: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1837#issuecomment-2976046084 PRs: - https://github.com/apache/datafusion-sqlparser-rs/pull/1887 - https://github.com/apache/datafusion-sqlparser-rs/pull/1888 -- This is an automated message

Re: [I] Evaluate filter pushdown against the physical schema for performance and correctness [datafusion]

2025-06-16 Thread via GitHub
alamb commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2976060769 > variability comes from reading the data, parsing parquet, etc. and once it's in memory as arrow converting from Int32 to Int64 is tri This makes a lot of sense to me

Re: [PR] Add license header check to CI [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
alamb commented on PR #1888: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1888#issuecomment-2976045858 The new CI test fails with the same message I got when creating the release candidate, as expected (I expect it to fail until https://github.com/apache/datafusion-sqlparser

Re: [PR] Add compression option to SpillManager [datafusion]

2025-06-16 Thread via GitHub
ding-young commented on code in PR #16268: URL: https://github.com/apache/datafusion/pull/16268#discussion_r2149653552 ## datafusion/common/src/config.rs: ## @@ -330,6 +386,13 @@ config_namespace! { /// the new schema verification step. pub skip_physical_aggreg

Re: [PR] Extend exception handling [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
iffyio commented on code in PR #1884: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1884#discussion_r2149643500 ## src/ast/mod.rs: ## @@ -2982,6 +2982,63 @@ impl From for Statement { } } +/// An exception representing exception handling with the `EXCEPTION

[PR] chore(deps): bump libc from 0.2.172 to 0.2.173 [datafusion]

2025-06-16 Thread via GitHub
dependabot[bot] opened a new pull request, #16421: URL: https://github.com/apache/datafusion/pull/16421 Bumps [libc](https://github.com/rust-lang/libc) from 0.2.172 to 0.2.173. Release notes Sourced from https://github.com/rust-lang/libc/releases";>libc's releases. 0.2.173

[PR] chore(deps): bump prost-build from 0.13.5 to 0.14.0 in the proto group [datafusion]

2025-06-16 Thread via GitHub
dependabot[bot] opened a new pull request, #16420: URL: https://github.com/apache/datafusion/pull/16420 Bumps the proto group with 1 update: [prost-build](https://github.com/tokio-rs/prost). Updates `prost-build` from 0.13.5 to 0.14.0 Changelog Sourced from https://github.co

Re: [PR] explicitly create temp path [datafusion-ballista]

2025-06-16 Thread via GitHub
milenkovicm commented on PR #1273: URL: https://github.com/apache/datafusion-ballista/pull/1273#issuecomment-2975614172 thanks for your contribution @Huy1Ng, can you please enable disabled tests as well (search for `#[cfg(not(windows))]`), one of those tests is https://github.com/a

[PR] feat: support fixed size list for array reverse [datafusion]

2025-06-16 Thread via GitHub
chenkovsky opened a new pull request, #16423: URL: https://github.com/apache/datafusion/pull/16423 ## Which issue does this PR close? ## Rationale for this change array_reverse doesn't support fixed size list now. ## What changes are included in this PR? suppor

Re: [I] Access Data from S3 in DeltaLake format using Ballista on Kubernetes [datafusion-ballista]

2025-06-16 Thread via GitHub
janbraunsdorff commented on issue #1268: URL: https://github.com/apache/datafusion-ballista/issues/1268#issuecomment-2976438384 @milenkovicm thank you very much for your help works well. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] feat: mapping sql Char/Text/String default to Utf8View [datafusion]

2025-06-16 Thread via GitHub
xudong963 commented on PR #16290: URL: https://github.com/apache/datafusion/pull/16290#issuecomment-2976464829 Some tests are failing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: Enable WASM compilation by making sqlparser's recursive-protection optional [datafusion]

2025-06-16 Thread via GitHub
jonmmease commented on code in PR #16418: URL: https://github.com/apache/datafusion/pull/16418#discussion_r2150164907 ## datafusion/core/Cargo.toml: ## @@ -79,6 +81,9 @@ recursive_protection = [ "datafusion-physical-optimizer/recursive_protection", "datafusion-sql/recu

Re: [PR] chore: Implement date_trunc as ScalarUDFImpl [datafusion-comet]

2025-06-16 Thread via GitHub
comphead merged PR #1880: URL: https://github.com/apache/datafusion-comet/pull/1880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Extend exception handling [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
bombsimon commented on code in PR #1884: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1884#discussion_r2150350430 ## src/ast/mod.rs: ## @@ -2982,6 +2982,63 @@ impl From for Statement { } } +/// An exception representing exception handling with the `EXCEPT

Re: [PR] Extend exception handling [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
bombsimon commented on code in PR #1884: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1884#discussion_r2150372100 ## src/ast/mod.rs: ## @@ -2982,6 +2982,63 @@ impl From for Statement { } } +/// An exception representing exception handling with the `EXCEPT

Re: [PR] Extend exception handling [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
bombsimon commented on code in PR #1884: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1884#discussion_r2150371186 ## src/ast/mod.rs: ## @@ -3670,17 +3727,24 @@ pub enum Statement { /// END; /// ``` statements: Vec, -/// Statemen

Re: [PR] Extend exception handling [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
bombsimon commented on code in PR #1884: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1884#discussion_r2150374474 ## tests/sqlparser_common.rs: ## @@ -8593,7 +8593,7 @@ fn lateral_function() { fn parse_start_transaction() { let dialects = all_dialects_excep

Re: [PR] Extend exception handling [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
bombsimon commented on code in PR #1884: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1884#discussion_r2150373947 ## src/parser/mod.rs: ## @@ -15096,12 +15096,16 @@ impl<'a> Parser<'a> { transaction: Some(BeginTransactionKind::Transaction),

Re: [PR] Extend exception handling [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
bombsimon commented on code in PR #1884: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1884#discussion_r2150350430 ## src/ast/mod.rs: ## @@ -2982,6 +2982,63 @@ impl From for Statement { } } +/// An exception representing exception handling with the `EXCEPT

Re: [PR] feat: support fixed size list for array reverse [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16423: URL: https://github.com/apache/datafusion/pull/16423#discussion_r2150374823 ## datafusion/functions-nested/src/reverse.rs: ## @@ -175,3 +182,45 @@ fn general_array_reverse>( Some(nulls.into()), )?)) } + +fn fixed_size_ar

Re: [PR] Add license header to display_utils.rs and pretty_print.rs [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
alamb commented on PR #1887: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1887#issuecomment-2977356858 Thansk @iffyio -- I am now preparing the release candidate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: support fixed size list for array reverse [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16423: URL: https://github.com/apache/datafusion/pull/16423#discussion_r2150444753 ## datafusion/functions-nested/src/reverse.rs: ## @@ -175,3 +182,45 @@ fn general_array_reverse>( Some(nulls.into()), )?)) } + +fn fixed_size_ar

Re: [I] Release sqlparser-rs version `0.57.0` around 2024-06-15 [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
alamb commented on issue #1837: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1837#issuecomment-2977367952 Ok, we have a release candidate and I have started a vote thread: - https://lists.apache.org/thread/5xslm7vy7bkw9k0bmzojf1g1130nw6vx -- This is an automated messa

Re: [PR] feat: support fixed size list for array reverse [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16423: URL: https://github.com/apache/datafusion/pull/16423#discussion_r2150397306 ## datafusion/functions-nested/src/reverse.rs: ## @@ -175,3 +182,45 @@ fn general_array_reverse>( Some(nulls.into()), )?)) } + +fn fixed_size_ar

Re: [PR] feat: support fixed size list for array reverse [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16423: URL: https://github.com/apache/datafusion/pull/16423#discussion_r2150385634 ## datafusion/functions-nested/src/reverse.rs: ## @@ -175,3 +182,45 @@ fn general_array_reverse>( Some(nulls.into()), )?)) } + +fn fixed_size_ar

[PR] minor: Avoid rewriting join to unsupported join [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove opened a new pull request, #1888: URL: https://github.com/apache/datafusion-comet/pull/1888 ## Which issue does this PR close? Related to https://github.com/apache/datafusion-comet/issues/457 ## Rationale for this change Comet does not support ha

Re: [PR] feat: support fixed size list for array reverse [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16423: URL: https://github.com/apache/datafusion/pull/16423#discussion_r2150374823 ## datafusion/functions-nested/src/reverse.rs: ## @@ -175,3 +182,45 @@ fn general_array_reverse>( Some(nulls.into()), )?)) } + +fn fixed_size_ar

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on code in PR #16424: URL: https://github.com/apache/datafusion/pull/16424#discussion_r2150409636 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -524,6 +512,91 @@ fn should_enable_page_index( .unwrap_or(false) } +/// Prune based on parti

Re: [PR] Use dedicated NullEquality enum instead of null_equals_null boolean [datafusion]

2025-06-16 Thread via GitHub
tobixdev commented on code in PR #16419: URL: https://github.com/apache/datafusion/pull/16419#discussion_r2150797422 ## datafusion/common/src/null_equality.rs: ## @@ -0,0 +1,34 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-16 Thread via GitHub
shehabgamin commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2150803520 ## datafusion/sqllogictest/test_files/spark/array/array.slt: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contr

Re: [I] Spark ColumnarToRowExec cannot pass CometBuffer safety check [datafusion-comet]

2025-06-16 Thread via GitHub
viirya commented on issue #1059: URL: https://github.com/apache/datafusion-comet/issues/1059#issuecomment-2978021382 Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Spark ColumnarToRowExec cannot pass CometBuffer safety check [datafusion-comet]

2025-06-16 Thread via GitHub
viirya closed issue #1059: Spark ColumnarToRowExec cannot pass CometBuffer safety check URL: https://github.com/apache/datafusion-comet/issues/1059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[I] Implement Single Join [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n opened a new issue, #16425: URL: https://github.com/apache/datafusion/issues/16425 ### Is your feature request related to a problem or challenge? This is part of #13181 for looking into different joins. ## What is a Single Join Single joins are similar to a regula

Re: [I] Implement Single Join [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on issue #16425: URL: https://github.com/apache/datafusion/issues/16425#issuecomment-2978925656 Thoughts @Dandandan? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-16 Thread via GitHub
alamb commented on PR #74: URL: https://github.com/apache/datafusion-site/pull/74#issuecomment-2975930224 Blogs are posted: - https://datafusion.staged.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one/ - https://datafusion.staged.apache.org/blog/2025/06/15/optimizing-sql-

Re: [PR] fix: Enable WASM compilation by making sqlparser's recursive-protection optional [datafusion]

2025-06-16 Thread via GitHub
alamb commented on code in PR #16418: URL: https://github.com/apache/datafusion/pull/16418#discussion_r2149552742 ## datafusion/core/Cargo.toml: ## @@ -79,6 +81,9 @@ recursive_protection = [ "datafusion-physical-optimizer/recursive_protection", "datafusion-sql/recursiv

Re: [PR] Prepare for `0.57.0` release [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
alamb commented on PR #1885: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1885#issuecomment-2976007978 Thanks @iffyio -- I'll get a release candidate out today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Release sqlparser-rs version `0.57.0` around 2024-06-15 [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
alamb commented on issue #1837: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1837#issuecomment-2976015123 I tried to create an RC1 but got some errors on the creation script: ``` [1]: https://github.com/apache/datafusion-sqlparser-rs/tree/0f2208d293d4eb8c9ca750288

Re: [PR] Add support for cluster by expressions [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
iffyio merged PR #1883: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Add support for snowflake cluster by expressions [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
iffyio closed issue #1882: Add support for snowflake cluster by expressions URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] Add license header to display_utils.rs and pretty_print.rs [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
alamb opened a new pull request, #1887: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1887 - Part of https://github.com/apache/datafusion-sqlparser-rs/issues/1837 When I tried to make a release caiddate for 0.57.0 I got some errors about files not having the Apache Licen

Re: [PR] feat: mapping sql Char/Text/String default to Utf8View [datafusion]

2025-06-16 Thread via GitHub
zhuqi-lucas commented on PR #16290: URL: https://github.com/apache/datafusion/pull/16290#issuecomment-2976695923 > Some tests are failing Fixed in latest PR, thank you @xudong963 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] feat: pass ignore_nulls flag to first and last [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on PR #1866: URL: https://github.com/apache/datafusion-comet/pull/1866#issuecomment-2976849328 @rluvaton Here is a test that fails in main and passes with your changes in this PR. Could you add this to `CometAggregateSuite` as part of this PR? ```scala test("

[PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
adriangb opened a new pull request, #16424: URL: https://github.com/apache/datafusion/pull/16424 https://github.com/apache/datafusion/pull/16014#issuecomment-2977125894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on code in PR #16424: URL: https://github.com/apache/datafusion/pull/16424#discussion_r2150394841 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -524,6 +512,91 @@ fn should_enable_page_index( .unwrap_or(false) } +/// Prune based on parti

Re: [PR] feat: support fixed size list for array reverse [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16423: URL: https://github.com/apache/datafusion/pull/16423#discussion_r2150397306 ## datafusion/functions-nested/src/reverse.rs: ## @@ -175,3 +182,45 @@ fn general_array_reverse>( Some(nulls.into()), )?)) } + +fn fixed_size_ar

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-2977267452 cc @alamb I think this resolves the concern about perf overhead of this late pruning when there are no dynamic filters; it's a tossup of what happens when there are dynamic filters,

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #1576: URL: https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2977847545 This issue was likely resolved https://github.com/apache/datafusion-comet/pull/693 so will close for now. @mkgada feel free to reopen if this is still an issue -- This

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #1576: NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) URL: https://github.com/apache/datafusion-comet/issues/1576 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] NoSuchMethodError with Spark 3.5.3 (EMR 7.6) [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #1451: URL: https://github.com/apache/datafusion-comet/issues/1451#issuecomment-2977837094 Closing this issue since it is inactive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
mbutrovich commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2977839997 Thank you and @adamreeve for driving so much of the modular encryption work! I'll take a look at this branch this week and see how this might get Comet supporting modular encrypti

Re: [I] NoSuchMethodError with Spark 3.5.3 (EMR 7.6) [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #1451: NoSuchMethodError with Spark 3.5.3 (EMR 7.6) URL: https://github.com/apache/datafusion-comet/issues/1451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Few tests fail on windows [datafusion-ballista]

2025-06-16 Thread via GitHub
milenkovicm closed issue #1117: Few tests fail on windows URL: https://github.com/apache/datafusion-ballista/issues/1117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] Fix rat check errors during release process [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #1678: Fix rat check errors during release process URL: https://github.com/apache/datafusion-comet/issues/1678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2977884089 I'd be curious to see perf impact after we merge https://github.com/apache/datafusion/pull/16424 as well -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] explicitly create temp path [datafusion-ballista]

2025-06-16 Thread via GitHub
milenkovicm commented on PR #1273: URL: https://github.com/apache/datafusion-ballista/pull/1273#issuecomment-2976759444 thanks @Huy1Ng PR looks good. can you please remove lint action as well to make this PR pass checks? ``` jobs: lint: name: Lint C++, Pyth

Re: [PR] explicitly create temp path [datafusion-ballista]

2025-06-16 Thread via GitHub
Huy1Ng commented on PR #1273: URL: https://github.com/apache/datafusion-ballista/pull/1273#issuecomment-2976809476 sure, I wonder why we need to lint arrow at all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
Dandandan commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2150648173 ## datafusion/physical-optimizer/src/filter_pushdown.rs: ## @@ -362,17 +363,25 @@ use itertools::izip; /// [`ProjectionExec`]: datafusion_physical_plan::project

Re: [I] panic `StructBuilder and field_builder with index 0 (Utf8) are of unequal lengths: (1 != 0)` when running with delta lake extension and `spark.comet.exec.shuffle.fallbackToColumnar` is true [d

2025-06-16 Thread via GitHub
rluvaton commented on issue #1867: URL: https://github.com/apache/datafusion-comet/issues/1867#issuecomment-2977653930 testing again with main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
Dandandan commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2150649026 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -118,7 +131,7 @@ fn test_filter_collapse() { let plan = Arc::new(FilterExec::try_n

Re: [I] panic `StructBuilder and field_builder with index 0 (Utf8) are of unequal lengths: (1 != 0)` when running with delta lake extension and `spark.comet.exec.shuffle.fallbackToColumnar` is true [d

2025-06-16 Thread via GitHub
rluvaton closed issue #1867: panic `StructBuilder and field_builder with index 0 (Utf8) are of unequal lengths: (1 != 0)` when running with delta lake extension and `spark.comet.exec.shuffle.fallbackToColumnar` is true URL: https://github.com/apache/datafusion-comet/issues/1867 -- This is an

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2977736578 > > Sounds reasonable. Perhaps just keep it this way and see if we somehow can separate it to 2 different passes later. > > It seems like the branch [pydantic#30](https://github

Re: [I] panic `StructBuilder and field_builder with index 0 (Utf8) are of unequal lengths: (1 != 0)` when running with delta lake extension and `spark.comet.exec.shuffle.fallbackToColumnar` is true [d

2025-06-16 Thread via GitHub
rluvaton commented on issue #1867: URL: https://github.com/apache/datafusion-comet/issues/1867#issuecomment-2977724434 It's working on main, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] [complex types] Improve "unsupported schema" message [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #1612: URL: https://github.com/apache/datafusion-comet/issues/1612#issuecomment-2977650465 Fixed in https://github.com/apache/datafusion-comet/pull/1667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] RecordBatch might have logical row mapping on physical arrays [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #974: URL: https://github.com/apache/datafusion-comet/issues/974#issuecomment-2977768904 @parthchandra @huaxingao is this still a valid issue that we need to fix? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] abs returns incorrect value in some cases [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #666: URL: https://github.com/apache/datafusion-comet/issues/666#issuecomment-2977762845 Closing this bug because we disabled the feature. Filed https://github.com/apache/datafusion-comet/issues/1890 to re-implement. -- This is an automated message from the A

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2977763190 > I think it's great! I added one suggestion to minimize the changes to the explain output and maybe phrase the "pre" and "post" more as "normal" filter pushdown and "dynamic" filte

Re: [I] abs returns incorrect value in some cases [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #666: abs returns incorrect value in some cases URL: https://github.com/apache/datafusion-comet/issues/666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Reported OOM with high cardinality distrinct aggregates [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #677: Reported OOM with high cardinality distrinct aggregates URL: https://github.com/apache/datafusion-comet/issues/677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-06-16 Thread via GitHub
swaingotnochill commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-2976659219 I will try to study and dig more on this. It would be great if you can give starting pointers when i am stuck :) -- This is an automated message from the Apache Git Se

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-06-16 Thread via GitHub
xiedeyantu commented on PR #16386: URL: https://github.com/apache/datafusion/pull/16386#issuecomment-2976974834 @alamb Could you help reivew this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Add support for SMJ with RightSemi join [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #1725: URL: https://github.com/apache/datafusion-comet/issues/1725#issuecomment-2976996526 Thanks for looking at this @dharanad. You are correct, Spark does not support this. We should probably remove `RightSemi` from the `operator.proto` file to avoid confusio

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on code in PR #16424: URL: https://github.com/apache/datafusion/pull/16424#discussion_r2150409636 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -524,6 +512,91 @@ fn should_enable_page_index( .unwrap_or(false) } +/// Prune based on parti

Re: [PR] Add license header to display_utils.rs and pretty_print.rs [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
iffyio merged PR #1887: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] chore: Enable tests for casting timestamp to numeric types [datafusion-comet]

2025-06-16 Thread via GitHub
codecov-commenter commented on PR #1891: URL: https://github.com/apache/datafusion-comet/pull/1891#issuecomment-2977904786 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1891?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Fix rat check errors during release process [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #1678: URL: https://github.com/apache/datafusion-comet/issues/1678#issuecomment-2977832506 fixed in https://github.com/apache/datafusion-comet/pull/1689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Queries with exchange reuse sometimes fail in Comet [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #1798: Queries with exchange reuse sometimes fail in Comet URL: https://github.com/apache/datafusion-comet/issues/1798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] explicitly create temp path [datafusion-ballista]

2025-06-16 Thread via GitHub
milenkovicm commented on PR #1273: URL: https://github.com/apache/datafusion-ballista/pull/1273#issuecomment-2976823439 we should not, we need to cleanup github actions (https://github.com/apache/datafusion-ballista/issues/1128) but no takers so far -- This is an automated message from t

  1   2   3   >