Re: [PR] Snowflake: support trailing options in `CREATE TABLE` [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
yoavcloud commented on code in PR #1931: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1931#discussion_r2199758669 ## tests/sqlparser_snowflake.rs: ## @@ -995,6 +995,21 @@ fn test_snowflake_create_iceberg_table_without_location() { ); } +#[test] +fn test_s

Re: [PR] Improve display format of BoundedWindowAggExec [datafusion]

2025-07-10 Thread via GitHub
Chen-Yuan-Lai commented on PR #16645: URL: https://github.com/apache/datafusion/pull/16645#issuecomment-3060795405 > > I suspect updating the results was faster than it might have previously been thanks to the work @blaginin @Chen-Yuan-Lai have done to migrate most of our plan tests to `ins

[I] Bug: `make_date(year, month, day)` reports error if one of the fileds is NULL [datafusion]

2025-07-10 Thread via GitHub
xudong963 opened a new issue, #16746: URL: https://github.com/apache/datafusion/issues/16746 I tested other databases, such as PG and duckdb, when one of the fields is NULL, they won't report error, but return NULL, which makes more sense to me. Let me know wdyt -- This is an autom

Re: [PR] fix: add `order_requirement` & `dist_requirement` to `OutputRequirementExec` display [datafusion]

2025-07-10 Thread via GitHub
Loaki07 commented on PR #16726: URL: https://github.com/apache/datafusion/pull/16726#issuecomment-3060509206 I ran both `cargo test -p datafusion-sqllogictest --test sqllogictests` and `cargo test`, fixed them both. Please review. -- This is an automated message from the Apache Git Servic

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
kosiew commented on PR #16466: URL: https://github.com/apache/datafusion/pull/16466#issuecomment-3060493758 @blaginin Thanks for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
kosiew commented on code in PR #16466: URL: https://github.com/apache/datafusion/pull/16466#discussion_r2199572126 ## datafusion/core/tests/sql/aggregates.rs: ## @@ -441,3 +776,1110 @@ async fn count_distinct_dictionary_mixed_values() -> Result<()> { Ok(()) } + +/// Com

Re: [PR] chore: make more clarity for internal errors [datafusion]

2025-07-10 Thread via GitHub
comphead commented on PR #16741: URL: https://github.com/apache/datafusion/pull/16741#issuecomment-3060444252 Thanks @2010YOUY01 for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] chore: make more clarity for internal errors [datafusion]

2025-07-10 Thread via GitHub
comphead merged PR #16741: URL: https://github.com/apache/datafusion/pull/16741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-10 Thread via GitHub
2010YOUY01 commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2199484385 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices( probe_indices: UInt32Array, filte

Re: [I] Add date_format support [datafusion-comet]

2025-07-10 Thread via GitHub
jatin510 commented on issue #2014: URL: https://github.com/apache/datafusion-comet/issues/2014#issuecomment-3060316382 i would love to work on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-10 Thread via GitHub
2010YOUY01 commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2199423074 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -828,13 +833,127 @@ impl NestedLoopJoinStream { handle_state!(self.process

[PR] Fix invalid intervals in `satisfy_greater` [datafusion]

2025-07-10 Thread via GitHub
liamzwbao opened a new pull request, #16745: URL: https://github.com/apache/datafusion/pull/16745 ## Which issue does this PR close? - Closes #16736. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Per file filter evaluation [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2199354920 ## datafusion-examples/examples/default_column_values.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2199321302 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [I] Filtering and counting afterwards causes overflow panic in interval arithmetics [datafusion]

2025-07-10 Thread via GitHub
liamzwbao commented on issue #16736: URL: https://github.com/apache/datafusion/issues/16736#issuecomment-3059941385 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-10 Thread via GitHub
corwinjoy commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3059890834 This pull request introduces several updates across multiple components of the codebase, focusing on dependency management, feature enhancements, and code cleanup. The most signifi

Re: [I] Optimized spill file format [datafusion]

2025-07-10 Thread via GitHub
ding-young commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-3059889330 > So shall we close this issue as complete now? Yes, I think so. Of course, there's still room to seek further performance optimizations, but for now: - Validati

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-10 Thread via GitHub
adamreeve commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3059787257 This approach looks good to me! Do the existing tests hit the parallel write code path? We probably want to make sure there are encryption tests for both the parallel and serial co

Re: [PR] feat: Add JNI-based Hadoop FileSystem support for S3 and other Hadoop-compatible stores [datafusion-comet]

2025-07-10 Thread via GitHub
parthchandra commented on PR #1992: URL: https://github.com/apache/datafusion-comet/pull/1992#issuecomment-3059721586 Update on this: 1) fs-hdfs includes its [own version of libhdfs](https://github.com/datafusion-contrib/fs-hdfs/tree/hadoop-3.1.4/c_src/libhdfs) as part of its build so

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
ryanschneider commented on PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#issuecomment-3059681356 Rebased against main to resolve conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2199066365 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2199063242 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [PR] chore: Improve process for generating dynamic content into documentation [datafusion-comet]

2025-07-10 Thread via GitHub
codecov-commenter commented on PR #2017: URL: https://github.com/apache/datafusion-comet/pull/2017#issuecomment-3059650824 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2017?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] chore: Remove docs templates [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove opened a new pull request, #2017: URL: https://github.com/apache/datafusion-comet/pull/2017 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/2016 ## Rationale for this change Keep one copy of the content

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3059497366 Ok I run a bisect and find the code is broken since #15769, I guess it's because the filter is moved to a different sub-structure. @adriangb can you take a look if this is

Re: [PR] minor: Refactor to move some shuffle-related logic from `QueryPlanSerde` to `CometExecRule` [datafusion-comet]

2025-07-10 Thread via GitHub
codecov-commenter commented on PR #2015: URL: https://github.com/apache/datafusion-comet/pull/2015#issuecomment-3059459541 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2015?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Improve documentation publishing to avoid maintaining separate template files [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove opened a new issue, #2016: URL: https://github.com/apache/datafusion-comet/issues/2016 ### What is the problem the feature request solves? For the configuration guide and the compatibility guide, we have both source templates and generated final markdown files checked into G

[PR] Refactor shuffle supported [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove opened a new pull request, #2015: URL: https://github.com/apache/datafusion-comet/pull/2015 ## Which issue does this PR close? N/A ## Rationale for this change Refactor to move code where it belongs. ## What changes are included in this PR

[I] Add date_format support [datafusion-comet]

2025-07-10 Thread via GitHub
kazuyukitanimura opened a new issue, #2014: URL: https://github.com/apache/datafusion-comet/issues/2014 ### What is the problem the feature request solves? date_format is not supported https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.d

Re: [PR] Ci cache [datafusion]

2025-07-10 Thread via GitHub
blaginin commented on PR #16709: URL: https://github.com/apache/datafusion/pull/16709#issuecomment-3059328323 linux build test: 5m → 2m check substrait features: 11m → 9m cargo test (amd64): 18m → 16m cargo examples: 17m → 11m clippy: 6m → 5m really like linux build test im

Re: [PR] [Draft]Add SQL logic tests for Run-End Encoded (REE) [datafusion]

2025-07-10 Thread via GitHub
rich-t-kid-datadog commented on PR #16715: URL: https://github.com/apache/datafusion/pull/16715#issuecomment-3059324594 TBD: add the `In` operator into the test, should fit in with the other string operations and its frequently used at DD -- This is an automated message from the Apache Gi

[I] Add from_utc_timestamp support [datafusion-comet]

2025-07-10 Thread via GitHub
kazuyukitanimura opened a new issue, #2013: URL: https://github.com/apache/datafusion-comet/issues/2013 ### What is the problem the feature request solves? from_utc_timestamp is not supported now https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.

Re: [I] TPC-H Q16 fails during deserialization [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao commented on issue #16665: URL: https://github.com/apache/datafusion/issues/16665#issuecomment-3059272448 Filled a fix, but need to dig deeper to understand why it doesn't trigger the bug in df47 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#issuecomment-3059275876 @parthchandra @mbutrovich would you also like to review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2198853680 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
blaginin commented on PR #16466: URL: https://github.com/apache/datafusion/pull/16466#issuecomment-3059259592 overall looks good and thank you for working on that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] Fix in list round trip in df proto [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao opened a new pull request, #16744: URL: https://github.com/apache/datafusion/pull/16744 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16665. ## Rationale for this change ## What changes are included in

Re: [PR] minor: Refactor to reduce duplicate serde code [datafusion-comet]

2025-07-10 Thread via GitHub
mbutrovich merged PR #2011: URL: https://github.com/apache/datafusion-comet/pull/2011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
blaginin commented on code in PR #16466: URL: https://github.com/apache/datafusion/pull/16466#discussion_r2198832751 ## datafusion/core/tests/sql/aggregates.rs: ## @@ -441,3 +776,1110 @@ async fn count_distinct_dictionary_mixed_values() -> Result<()> { Ok(()) } + +/// C

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
blaginin commented on code in PR #16466: URL: https://github.com/apache/datafusion/pull/16466#discussion_r2198824772 ## datafusion/core/tests/sql/aggregates.rs: ## @@ -441,3 +776,1110 @@ async fn count_distinct_dictionary_mixed_values() -> Result<()> { Ok(()) } + +/// C

Re: [I] TPC-H Q16 fails during deserialization [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao commented on issue #16665: URL: https://github.com/apache/datafusion/issues/16665#issuecomment-3059200639 Taking a look as LiquidCache also run into this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2198802030 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [PR] feat: Add a configuration to make parquet encryption optional [datafusion]

2025-07-10 Thread via GitHub
corwinjoy commented on code in PR #16649: URL: https://github.com/apache/datafusion/pull/16649#discussion_r2196418502 ## datafusion/core/Cargo.toml: ## @@ -61,13 +61,21 @@ default = [ "unicode_expressions", "compression", "parquet", +"parquet_encryption", Rev

Re: [PR] feat: Add a configuration to make parquet encryption optional [datafusion]

2025-07-10 Thread via GitHub
corwinjoy commented on code in PR #16649: URL: https://github.com/apache/datafusion/pull/16649#discussion_r2198712011 ## datafusion/common/src/encryption.rs: ## @@ -0,0 +1,76 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Per file filter evaluation [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2198638535 ## datafusion-examples/examples/default_column_values.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-10 Thread via GitHub
NGA-TRAN commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2198593774 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<(

[PR] Benchmark for char expression [datafusion]

2025-07-10 Thread via GitHub
ajita-asthana opened a new pull request, #16743: URL: https://github.com/apache/datafusion/pull/16743 ## Which issue does this PR close? - Closes #16009 ## Rationale for this change Add a benchmark to measure char performance ## What changes are included in this P

[PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-10 Thread via GitHub
NGA-TRAN opened a new pull request, #16742: URL: https://github.com/apache/datafusion/pull/16742 ## Which issue does this PR close? This are tests per @alamb's suggestion at https://github.com/apache/datafusion/pull/16662#pullrequestreview-2994410907: `I also wonder if we s

Re: [PR] chore: Drop support for RightSemi and RightAnti join types [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove merged PR #1935: URL: https://github.com/apache/datafusion-comet/pull/1935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Add support for SMJ with RightSemi join [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove closed issue #1725: Add support for SMJ with RightSemi join URL: https://github.com/apache/datafusion-comet/issues/1725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] docs: Add guide showing comparison between Comet and Gluten [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove opened a new pull request, #2012: URL: https://github.com/apache/datafusion-comet/pull/2012 ## Which issue does this PR close? N/A ## Rationale for this change We are often asked how Comet compares to Gluten. This documentation should help to an

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-10 Thread via GitHub
geoffreyclaude commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2198332815 ## content/blog/2025-07-10-datafusion-47.0.0.md: ## @@ -0,0 +1,256 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-10 +author:

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-10 Thread via GitHub
geoffreyclaude commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2198332815 ## content/blog/2025-07-10-datafusion-47.0.0.md: ## @@ -0,0 +1,256 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-10 +author:

Re: [PR] chore: Implement BloomFilterMightContain as a ScalarUDFImpl [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove merged PR #1954: URL: https://github.com/apache/datafusion-comet/pull/1954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] minor: Refactor to reduce duplicate serde code [datafusion-comet]

2025-07-10 Thread via GitHub
Copilot commented on code in PR #2011: URL: https://github.com/apache/datafusion-comet/pull/2011#discussion_r2198329182 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -56,6 +56,7 @@ import org.apache.comet.objectstore.NativeConfig import org.apache.c

Re: [I] Support uneven partition inputs HashJoinExec in Partitioned mode [datafusion]

2025-07-10 Thread via GitHub
timsaucer commented on issue #16740: URL: https://github.com/apache/datafusion/issues/16740#issuecomment-3058355404 As a work around, or more likely correct use, is that I should have a pre-defined number of output partitions for these rather than one per unique value of the partition key.

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3058345438 > Thus I think we should proceed trying to get https://github.com/apache/arrow-rs/pull/7850 merged. Great! I plan to take another look in a few days (being occupied by ot

Re: [PR] minor: Refactor to reduce duplicate serde code [datafusion-comet]

2025-07-10 Thread via GitHub
codecov-commenter commented on PR #2011: URL: https://github.com/apache/datafusion-comet/pull/2011#issuecomment-3058330148 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2011?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] minor: Refactor to reduce duplicate serde code [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove opened a new pull request, #2011: URL: https://github.com/apache/datafusion-comet/pull/2011 ## Which issue does this PR close? N/A ## Rationale for this change Simplify serde code and avoid duplicate code ## What changes are included in th

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on PR #16732: URL: https://github.com/apache/datafusion/pull/16732#issuecomment-3058224108 > Note that I added a HashJoinExec implementation to motivate this PR but remove it in [5940cca](https://github.com/apache/datafusion/pull/16732/commits/5940cca7c8ca9620781664425fb4

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-10 Thread via GitHub
codecov-commenter commented on PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#issuecomment-3058219290 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2010?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Drop support for RightSemi and RightAnti join types [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on PR #1935: URL: https://github.com/apache/datafusion-comet/pull/1935#issuecomment-3058210238 @dharanad I'm not sure why all the CI tests were cancelled. I just merged the latest from main and triggered the tests again. -- This is an automated message from the Apache

Re: [PR] Add support for `+` char in Snowflake stage names [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio merged PR #1935: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-10 Thread via GitHub
jonathanc-n commented on PR #16443: URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3058063741 @2010YOUY01 Special types need to only return the matching rows, so only one side needs to return rows while the other side can return a null array and not be projected in the fi

[PR] chore: make more clarity for internal errors [datafusion]

2025-07-10 Thread via GitHub
comphead opened a new pull request, #16741: URL: https://github.com/apache/datafusion/pull/16741 ## Which issue does this PR close? Adding a Datafusion issue tracker URL for internal errors - Closes #. ## Rationale for this change ## What changes ar

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-10 Thread via GitHub
kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3058031572 @alamb last item to make this work > The giscus app is installed, otherwise visitors will not be able to comment and react. can you check if you have permission t

Re: [PR] Snowflake trailing options in CREATE TABLE [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1931: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1931#discussion_r2198043261 ## src/ast/helpers/stmt_create_table.rs: ## @@ -383,6 +383,16 @@ impl CreateTableBuilder { self } +/// Returns true if information o

Re: [PR] Add support for Redshift SELECT * EXCLUDE [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1936: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1936#discussion_r2198107622 ## tests/sqlparser_common.rs: ## @@ -15982,3 +15992,64 @@ fn parse_create_procedure_with_parameter_modes() { _ => unreachable!(), } } + +

Re: [PR] Snowflake Reserved SQL Keywords as Implicit Table Alias [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1934: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1934#discussion_r2198077430 ## src/dialect/snowflake.rs: ## @@ -345,6 +345,85 @@ impl Dialect for SnowflakeDialect { } } +fn is_table_alias(&self, kw: &Keyword,

Re: [PR] Add support for granting privileges to procedures and functions in Snowflake [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio merged PR #1930: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[I] Support uneven partition inputs HashJoinExec in Partitioned mode [datafusion]

2025-07-10 Thread via GitHub
timsaucer opened a new issue, #16740: URL: https://github.com/apache/datafusion/issues/16740 ### Is your feature request related to a problem or challenge? I have a case where I have two table providers. They produce partitioned data with a partition hash. I want to be able to do effi

Re: [I] Investigate performance for TPC-H q9 query [datafusion-comet]

2025-07-10 Thread via GitHub
comphead commented on issue #2006: URL: https://github.com/apache/datafusion-comet/issues/2006#issuecomment-3057917895 Hey @dharanad I'm thinking of running the bench locally which is described https://datafusion.apache.org/comet/contributor-guide/benchmarking_macos.html and using Samply

Re: [I] Investigate performance for TPC-H q17 query [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on issue #2008: URL: https://github.com/apache/datafusion-comet/issues/2008#issuecomment-3057914312 # Comet 0.9.0 Plan ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- !CometHashAggregate [sum#146, isEmpty#147], Final, [sum(l_extendedprice#2

Re: [PR] Add support for Snowflake identifier function [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio merged PR #1929: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Fix for Postgres regex and like binary operators [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1928: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1928#discussion_r2198022760 ## tests/sqlparser_postgres.rs: ## @@ -2207,19 +2223,31 @@ fn parse_pg_like_match_ops() { ]; for (str_op, op) in pg_like_match_ops { -

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2198011580 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> bool {

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-10 Thread via GitHub
zhuqi-lucas commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3057792817 @alamb @Dandandan I submit a PR based this PR, try to see if fast_gc can also help for sort_tpch or sort_tpch10 benchmark: https://github.com/apache/datafusion/pull

Re: [PR] [Draft]Add SQL logic tests for Run-End Encoded (REE) [datafusion]

2025-07-10 Thread via GitHub
rich-t-kid-datadog commented on code in PR #16715: URL: https://github.com/apache/datafusion/pull/16715#discussion_r2197952864 ## datafusion/sqllogictest/test_files/run_end_encoding.slt: ## @@ -0,0 +1,340 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

[PR] Draft: Test fast gc for sort string view [datafusion]

2025-07-10 Thread via GitHub
zhuqi-lucas opened a new pull request, #16739: URL: https://github.com/apache/datafusion/pull/16739 ## Which issue does this PR close? Test fast gc for sort string view, used to get benchmark result for: sort_tpch Q3 and Q11 which is sort string view ## Rationale for

Re: [PR] fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on PR #2000: URL: https://github.com/apache/datafusion-comet/pull/2000#issuecomment-3057756699 @huaxingao Could you add some unit tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #2000: URL: https://github.com/apache/datafusion-comet/pull/2000#discussion_r2197931502 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -290,15 +290,137 @@ public static ColumnDescriptor buildColumnDescriptor(ParquetColumnSpe

Re: [PR] fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #2000: URL: https://github.com/apache/datafusion-comet/pull/2000#discussion_r2197933479 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -290,15 +290,137 @@ public static ColumnDescriptor buildColumnDescriptor(ParquetColumnSpe

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2197906691 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2744,6 +2744,16 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] chore: Implement BloomFilterMightContain as a ScalarUDFImpl [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on PR #1954: URL: https://github.com/apache/datafusion-comet/pull/1954#issuecomment-3057691022 There was a merge conflict due to merging some other PRs this morning, so I took the liberty of fixing the conflict. This is the next PR in the merge queue. -- This is an au

Re: [PR] Feat: support map_from_arrays [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove merged PR #1932: URL: https://github.com/apache/datafusion-comet/pull/1932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] perf: Optimize `AvgDecimalGroupsAccumulator` [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove merged PR #1893: URL: https://github.com/apache/datafusion-comet/pull/1893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2197876746 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -677,7 +677,14 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2197874159 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2744,6 +2744,16 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

[PR] Add support for Redshift SELECT * EXCLUDE [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
yoavcloud opened a new pull request, #1936: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1936 Redshift supports placing the `EXCLUDE` option at the end of the projection list, not necessarily after the wildcard. For example: `SELECT *, c1 EXCLUDE c2 FROM test` (exclude column

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on PR #16732: URL: https://github.com/apache/datafusion/pull/16732#issuecomment-3057497800 Thank you for the review @alamb I incorporated your suggestion for restructuring the enum / struct into a struct with a discriminant field. It is much better. I also re

Re: [I] Add support for SortAggregateExec [datafusion-comet]

2025-07-10 Thread via GitHub
kazantsev-maksim commented on issue #1994: URL: https://github.com/apache/datafusion-comet/issues/1994#issuecomment-3057482879 i can try implementing this @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-10 Thread via GitHub
zhuqi-lucas commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3057355716 > I ran the new clickbench_pushdown benchmark and TLDR is the new pushdown decoder look like they make a measurable difference 🎉 > > Thus I think we should proceed trying t

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on code in PR #16732: URL: https://github.com/apache/datafusion/pull/16732#discussion_r2197638542 ## datafusion/physical-plan/src/filter.rs: ## @@ -481,32 +481,29 @@ impl ExecutionPlan for FilterExec { _config: &ConfigOptions, ) -> Result>> {

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on code in PR #16732: URL: https://github.com/apache/datafusion/pull/16732#discussion_r2197624064 ## datafusion/physical-plan/src/filter.rs: ## @@ -481,32 +481,29 @@ impl ExecutionPlan for FilterExec { _config: &ConfigOptions, ) -> Result>> {

Re: [PR] Add `clickbench_pushdown` benchmark [datafusion]

2025-07-10 Thread via GitHub
alamb commented on PR #16731: URL: https://github.com/apache/datafusion/pull/16731#issuecomment-3057246306 I tested this benchmark with our filter pushdown work here, and I think it is useful https://github.com/apache/datafusion/pull/16711#issuecomment-3057240913 Thank you @zhu

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-10 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3057240913 I ran the new clickbench_pushdown benchmark and TLDR is the new pushdown decoder look like they make a measurable difference 🎉 Thus I think we should proceed trying to get http

[PR] feat: randn expression support [datafusion-comet]

2025-07-10 Thread via GitHub
akupchinskiy opened a new pull request, #2010: URL: https://github.com/apache/datafusion-comet/pull/2010 ## Which issue does this PR close? Closes #. ## Rationale for this change Added support for spark randn expression (standard gaussian random variable) #

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
alamb commented on code in PR #16732: URL: https://github.com/apache/datafusion/pull/16732#discussion_r2197549066 ## datafusion/physical-plan/src/filter_pushdown.rs: ## @@ -95,13 +95,110 @@ pub enum PredicateSupport { } impl PredicateSupport { +/// Return the wrapped exp

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
alamb commented on code in PR #16732: URL: https://github.com/apache/datafusion/pull/16732#discussion_r2197504180 ## datafusion/physical-plan/src/filter.rs: ## @@ -481,32 +481,29 @@ impl ExecutionPlan for FilterExec { _config: &ConfigOptions, ) -> Result>> {

Re: [PR] Initial commit [datafusion]

2025-07-10 Thread via GitHub
rok commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3057127577 Note: this includes some unrelated changes so as to be able to use changes in https://github.com/apache/arrow-rs/pull/7818. Also note: `row_group_index` at write time is not handled co

  1   2   >