Re: [I] Could not resolve Feature 'ghcr.io/devcontainers-contrib/features/protoc:1' [datafusion-ballista]

2025-07-10 Thread via GitHub
milenkovicm closed issue #1277: Could not resolve Feature 'ghcr.io/devcontainers-contrib/features/protoc:1' URL: https://github.com/apache/datafusion-ballista/issues/1277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] fix: devcontainer protoc:1 feature url [datafusion-ballista]

2025-07-10 Thread via GitHub
milenkovicm merged PR #1278: URL: https://github.com/apache/datafusion-ballista/pull/1278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-07-10 Thread via GitHub
shehabgamin commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3055982838 > An update here is that we are waiting for one or two more good example PRs and then we'll turn the community on porting > > If anyone wants to take a look / help out

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
kosiew commented on PR #16466: URL: https://github.com/apache/datafusion/pull/16466#issuecomment-3056158061 @jonathanc-n Can I trouble you to review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[PR] Initial commit [datafusion]

2025-07-10 Thread via GitHub
rok opened a new pull request, #16738: URL: https://github.com/apache/datafusion/pull/16738 ## Which issue does this PR close? - Closes 16737. ## Rationale for this change #16351 added modular encryption reading and writing. This builds on top of #16351 and uses https://

Re: [PR] Initial commit [datafusion]

2025-07-10 Thread via GitHub
rok commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3057127577 Note: this includes some unrelated changes so as to be able to use changes in https://github.com/apache/arrow-rs/pull/7818. Also note: `row_group_index` at write time is not handled co

Re: [I] Bloom filters are unused for certain where clause patterns [datafusion]

2025-07-10 Thread via GitHub
alamb commented on issue #16697: URL: https://github.com/apache/datafusion/issues/16697#issuecomment-3057055273 Thank you for this well written ticket I agree that for a predicate like ```sql WHERE ((col1 = 'category_1' AND col2 = 'type_1') OR (col1 = 'category_2' AND col

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on PR #16732: URL: https://github.com/apache/datafusion/pull/16732#issuecomment-3057497800 Thank you for the review @alamb I incorporated your suggestion for restructuring the enum / struct into a struct with a discriminant field. It is much better. I also re

Re: [PR] perf: Optimize `AvgDecimalGroupsAccumulator` [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove merged PR #1893: URL: https://github.com/apache/datafusion-comet/pull/1893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2197876746 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -677,7 +677,14 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Feat: support map_from_arrays [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove merged PR #1932: URL: https://github.com/apache/datafusion-comet/pull/1932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Implement BloomFilterMightContain as a ScalarUDFImpl [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on PR #1954: URL: https://github.com/apache/datafusion-comet/pull/1954#issuecomment-3057691022 There was a merge conflict due to merging some other PRs this morning, so I took the liberty of fixing the conflict. This is the next PR in the merge queue. -- This is an au

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2197906691 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2744,6 +2744,16 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

[I] Bug: `make_date(year, month, day)` reports error if one of the fileds is NULL [datafusion]

2025-07-10 Thread via GitHub
xudong963 opened a new issue, #16746: URL: https://github.com/apache/datafusion/issues/16746 I tested other databases, such as PG and duckdb, when one of the fields is NULL, they won't report error, but return NULL, which makes more sense to me. Let me know wdyt -- This is an autom

[PR] Add Configurable RecordBatch Splitting for Large Input Batches [datafusion]

2025-07-10 Thread via GitHub
kosiew opened a new pull request, #16734: URL: https://github.com/apache/datafusion/pull/16734 ## Which issue does this PR close? - Closes #16717. ## Rationale for this change Large `RecordBatch`es produced by data sources can degrade performance and limit parallelism in

Re: [I] Add support for SortAggregateExec [datafusion-comet]

2025-07-10 Thread via GitHub
kazantsev-maksim commented on issue #1994: URL: https://github.com/apache/datafusion-comet/issues/1994#issuecomment-3057482879 i can try implementing this @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] Add support for Redshift SELECT * EXCLUDE [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
yoavcloud opened a new pull request, #1936: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1936 Redshift supports placing the `EXCLUDE` option at the end of the projection list, not necessarily after the wildcard. For example: `SELECT *, c1 EXCLUDE c2 FROM test` (exclude column

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2198011580 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> bool {

Re: [PR] Fix for Postgres regex and like binary operators [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1928: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1928#discussion_r2198022760 ## tests/sqlparser_postgres.rs: ## @@ -2207,19 +2223,31 @@ fn parse_pg_like_match_ops() { ]; for (str_op, op) in pg_like_match_ops { -

Re: [PR] Add support for Snowflake identifier function [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio merged PR #1929: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
alamb commented on code in PR #16732: URL: https://github.com/apache/datafusion/pull/16732#discussion_r2197504180 ## datafusion/physical-plan/src/filter.rs: ## @@ -481,32 +481,29 @@ impl ExecutionPlan for FilterExec { _config: &ConfigOptions, ) -> Result>> {

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
alamb commented on code in PR #16732: URL: https://github.com/apache/datafusion/pull/16732#discussion_r2197549066 ## datafusion/physical-plan/src/filter_pushdown.rs: ## @@ -95,13 +95,110 @@ pub enum PredicateSupport { } impl PredicateSupport { +/// Return the wrapped exp

[PR] feat: randn expression support [datafusion-comet]

2025-07-10 Thread via GitHub
akupchinskiy opened a new pull request, #2010: URL: https://github.com/apache/datafusion-comet/pull/2010 ## Which issue does this PR close? Closes #. ## Rationale for this change Added support for spark randn expression (standard gaussian random variable) #

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-10 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3057240913 I ran the new clickbench_pushdown benchmark and TLDR is the new pushdown decoder look like they make a measurable difference 🎉 Thus I think we should proceed trying to get http

Re: [PR] Add `clickbench_pushdown` benchmark [datafusion]

2025-07-10 Thread via GitHub
alamb commented on PR #16731: URL: https://github.com/apache/datafusion/pull/16731#issuecomment-3057246306 I tested this benchmark with our filter pushdown work here, and I think it is useful https://github.com/apache/datafusion/pull/16711#issuecomment-3057240913 Thank you @zhu

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on code in PR #16732: URL: https://github.com/apache/datafusion/pull/16732#discussion_r2197624064 ## datafusion/physical-plan/src/filter.rs: ## @@ -481,32 +481,29 @@ impl ExecutionPlan for FilterExec { _config: &ConfigOptions, ) -> Result>> {

Re: [PR] Add support for Redshift SELECT * EXCLUDE [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1936: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1936#discussion_r2198107622 ## tests/sqlparser_common.rs: ## @@ -15982,3 +15992,64 @@ fn parse_create_procedure_with_parameter_modes() { _ => unreachable!(), } } + +

Re: [PR] Snowflake trailing options in CREATE TABLE [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1931: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1931#discussion_r2198043261 ## src/ast/helpers/stmt_create_table.rs: ## @@ -383,6 +383,16 @@ impl CreateTableBuilder { self } +/// Returns true if information o

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-10 Thread via GitHub
kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3058031572 @alamb last item to make this work > The giscus app is installed, otherwise visitors will not be able to comment and react. can you check if you have permission t

[PR] chore: make more clarity for internal errors [datafusion]

2025-07-10 Thread via GitHub
comphead opened a new pull request, #16741: URL: https://github.com/apache/datafusion/pull/16741 ## Which issue does this PR close? Adding a Datafusion issue tracker URL for internal errors - Closes #. ## Rationale for this change ## What changes ar

Re: [PR] chore: Drop support for RightSemi and RightAnti join types [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on PR #1935: URL: https://github.com/apache/datafusion-comet/pull/1935#issuecomment-3058210238 @dharanad I'm not sure why all the CI tests were cancelled. I just merged the latest from main and triggered the tests again. -- This is an automated message from the Apache

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-10 Thread via GitHub
codecov-commenter commented on PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#issuecomment-3058219290 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2010?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on PR #16732: URL: https://github.com/apache/datafusion/pull/16732#issuecomment-3058224108 > Note that I added a HashJoinExec implementation to motivate this PR but remove it in [5940cca](https://github.com/apache/datafusion/pull/16732/commits/5940cca7c8ca9620781664425fb4

[PR] minor: Refactor to reduce duplicate serde code [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove opened a new pull request, #2011: URL: https://github.com/apache/datafusion-comet/pull/2011 ## Which issue does this PR close? N/A ## Rationale for this change Simplify serde code and avoid duplicate code ## What changes are included in th

Re: [PR] minor: Refactor to reduce duplicate serde code [datafusion-comet]

2025-07-10 Thread via GitHub
codecov-commenter commented on PR #2011: URL: https://github.com/apache/datafusion-comet/pull/2011#issuecomment-3058330148 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2011?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3058345438 > Thus I think we should proceed trying to get https://github.com/apache/arrow-rs/pull/7850 merged. Great! I plan to take another look in a few days (being occupied by ot

Re: [I] Support uneven partition inputs HashJoinExec in Partitioned mode [datafusion]

2025-07-10 Thread via GitHub
timsaucer commented on issue #16740: URL: https://github.com/apache/datafusion/issues/16740#issuecomment-3058355404 As a work around, or more likely correct use, is that I should have a pre-defined number of output partitions for these rather than one per unique value of the partition key.

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on code in PR #16732: URL: https://github.com/apache/datafusion/pull/16732#discussion_r2197638542 ## datafusion/physical-plan/src/filter.rs: ## @@ -481,32 +481,29 @@ impl ExecutionPlan for FilterExec { _config: &ConfigOptions, ) -> Result>> {

Re: [I] Investigate performance for TPC-H q17 query [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on issue #2008: URL: https://github.com/apache/datafusion-comet/issues/2008#issuecomment-3057914312 # Comet 0.9.0 Plan ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- !CometHashAggregate [sum#146, isEmpty#147], Final, [sum(l_extendedprice#2

Re: [I] Investigate performance for TPC-H q9 query [datafusion-comet]

2025-07-10 Thread via GitHub
comphead commented on issue #2006: URL: https://github.com/apache/datafusion-comet/issues/2006#issuecomment-3057917895 Hey @dharanad I'm thinking of running the bench locally which is described https://datafusion.apache.org/comet/contributor-guide/benchmarking_macos.html and using Samply

[I] Support uneven partition inputs HashJoinExec in Partitioned mode [datafusion]

2025-07-10 Thread via GitHub
timsaucer opened a new issue, #16740: URL: https://github.com/apache/datafusion/issues/16740 ### Is your feature request related to a problem or challenge? I have a case where I have two table providers. They produce partitioned data with a partition hash. I want to be able to do effi

Re: [PR] Add support for granting privileges to procedures and functions in Snowflake [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio merged PR #1930: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Snowflake Reserved SQL Keywords as Implicit Table Alias [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio commented on code in PR #1934: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1934#discussion_r2198077430 ## src/dialect/snowflake.rs: ## @@ -345,6 +345,85 @@ impl Dialect for SnowflakeDialect { } } +fn is_table_alias(&self, kw: &Keyword,

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-10 Thread via GitHub
jonathanc-n commented on PR #16443: URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3058063741 @2010YOUY01 Special types need to only return the matching rows, so only one side needs to return rows while the other side can return a null array and not be projected in the fi

Re: [PR] Add support for `+` char in Snowflake stage names [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
iffyio merged PR #1935: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[I] Support multi-threaded writing of encrypted Parquet files [datafusion]

2025-07-10 Thread via GitHub
rok opened a new issue, #16737: URL: https://github.com/apache/datafusion/issues/16737 ### Is your feature request related to a problem or challenge? #16351 added modular encryption reading and writing. We should enable multi threaded encrypted writing. ### Describe the solutio

Re: [PR] chore(deps): bump clap from 4.5.40 to 4.5.41 [datafusion]

2025-07-10 Thread via GitHub
xudong963 merged PR #16735: URL: https://github.com/apache/datafusion/pull/16735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #2000: URL: https://github.com/apache/datafusion-comet/pull/2000#discussion_r2197933479 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -290,15 +290,137 @@ public static ColumnDescriptor buildColumnDescriptor(ParquetColumnSpe

Re: [PR] fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #2000: URL: https://github.com/apache/datafusion-comet/pull/2000#discussion_r2197931502 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -290,15 +290,137 @@ public static ColumnDescriptor buildColumnDescriptor(ParquetColumnSpe

Re: [PR] fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on PR #2000: URL: https://github.com/apache/datafusion-comet/pull/2000#issuecomment-3057756699 @huaxingao Could you add some unit tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[PR] Draft: Test fast gc for sort string view [datafusion]

2025-07-10 Thread via GitHub
zhuqi-lucas opened a new pull request, #16739: URL: https://github.com/apache/datafusion/pull/16739 ## Which issue does this PR close? Test fast gc for sort string view, used to get benchmark result for: sort_tpch Q3 and Q11 which is sort string view ## Rationale for

Re: [PR] [Draft]Add SQL logic tests for Run-End Encoded (REE) [datafusion]

2025-07-10 Thread via GitHub
rich-t-kid-datadog commented on code in PR #16715: URL: https://github.com/apache/datafusion/pull/16715#discussion_r2197952864 ## datafusion/sqllogictest/test_files/run_end_encoding.slt: ## @@ -0,0 +1,340 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-10 Thread via GitHub
zhuqi-lucas commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3057792817 @alamb @Dandandan I submit a PR based this PR, try to see if fast_gc can also help for sort_tpch or sort_tpch10 benchmark: https://github.com/apache/datafusion/pull

[I] Filtering and counting afterwards causes overflow panic in interval arithmetics [datafusion]

2025-07-10 Thread via GitHub
90degs2infty opened a new issue, #16736: URL: https://github.com/apache/datafusion/issues/16736 ### Describe the bug I'm trying to implement a "poor-man's" `any` function to check for rows matching a predicate in a dataframe: ```rust async fn any(df: DataFrame, predicate: Ex

Re: [PR] [Draft]Add SQL logic tests for Run-End Encoded (REE) [datafusion]

2025-07-10 Thread via GitHub
fmonjalet commented on code in PR #16715: URL: https://github.com/apache/datafusion/pull/16715#discussion_r2197141005 ## datafusion/sqllogictest/test_files/run_end_encoding.slt: ## @@ -0,0 +1,340 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contrib

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2197874159 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2744,6 +2744,16 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-10 Thread via GitHub
zhuqi-lucas commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3057355716 > I ran the new clickbench_pushdown benchmark and TLDR is the new pushdown decoder look like they make a measurable difference 🎉 > > Thus I think we should proceed trying t

Re: [PR] minor: Refactor to reduce duplicate serde code [datafusion-comet]

2025-07-10 Thread via GitHub
Copilot commented on code in PR #2011: URL: https://github.com/apache/datafusion-comet/pull/2011#discussion_r2198329182 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -56,6 +56,7 @@ import org.apache.comet.objectstore.NativeConfig import org.apache.c

Re: [PR] chore: Implement BloomFilterMightContain as a ScalarUDFImpl [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove merged PR #1954: URL: https://github.com/apache/datafusion-comet/pull/1954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-10 Thread via GitHub
geoffreyclaude commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2198332815 ## content/blog/2025-07-10-datafusion-47.0.0.md: ## @@ -0,0 +1,256 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-10 +author:

[PR] chore(deps): bump clap from 4.5.40 to 4.5.41 [datafusion]

2025-07-10 Thread via GitHub
dependabot[bot] opened a new pull request, #16735: URL: https://github.com/apache/datafusion/pull/16735 Bumps [clap](https://github.com/clap-rs/clap) from 4.5.40 to 4.5.41. Changelog Sourced from https://github.com/clap-rs/clap/blob/master/CHANGELOG.md";>clap's changelog. [4

Re: [PR] perf: Optimize hash joins with an empty build side [datafusion]

2025-07-10 Thread via GitHub
nuno-faria commented on code in PR #16716: URL: https://github.com/apache/datafusion/pull/16716#discussion_r2196943824 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -1498,6 +1498,23 @@ impl HashJoinStream { let timer = self.join_metrics.join_time.timer();

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-10 Thread via GitHub
coderfender commented on PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#issuecomment-3056256288 Added Unit test ( #1477 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-10 Thread via GitHub
2010YOUY01 commented on PR #16443: URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3056490809 > I have addressed all of your comments. @2010YOUY01 please take another look > > > I recommend to doc more high-level ideas to key functions, to make this module easier to

Re: [PR] fix: add `order_requirement` & `dist_requirement` to `OutputRequirementExec` display [datafusion]

2025-07-10 Thread via GitHub
Loaki07 commented on PR #16726: URL: https://github.com/apache/datafusion/pull/16726#issuecomment-3060509206 I ran both `cargo test -p datafusion-sqllogictest --test sqllogictests` and `cargo test`, fixed them both. Please review. -- This is an automated message from the Apache Git Servic

Re: [PR] Improve display format of BoundedWindowAggExec [datafusion]

2025-07-10 Thread via GitHub
Chen-Yuan-Lai commented on PR #16645: URL: https://github.com/apache/datafusion/pull/16645#issuecomment-3060795405 > > I suspect updating the results was faster than it might have previously been thanks to the work @blaginin @Chen-Yuan-Lai have done to migrate most of our plan tests to `ins

Re: [PR] Snowflake: support trailing options in `CREATE TABLE` [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
yoavcloud commented on code in PR #1931: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1931#discussion_r2199758669 ## tests/sqlparser_snowflake.rs: ## @@ -995,6 +995,21 @@ fn test_snowflake_create_iceberg_table_without_location() { ); } +#[test] +fn test_s

Re: [PR] chore: Drop support for RightSemi and RightAnti join types [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove merged PR #1935: URL: https://github.com/apache/datafusion-comet/pull/1935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Add support for SMJ with RightSemi join [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove closed issue #1725: Add support for SMJ with RightSemi join URL: https://github.com/apache/datafusion-comet/issues/1725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] docs: Add guide showing comparison between Comet and Gluten [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove opened a new pull request, #2012: URL: https://github.com/apache/datafusion-comet/pull/2012 ## Which issue does this PR close? N/A ## Rationale for this change We are often asked how Comet compares to Gluten. This documentation should help to an

Re: [PR] Per file filter evaluation [datafusion]

2025-07-10 Thread via GitHub
adriangb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2198638535 ## datafusion-examples/examples/default_column_values.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
blaginin commented on code in PR #16466: URL: https://github.com/apache/datafusion/pull/16466#discussion_r2198832751 ## datafusion/core/tests/sql/aggregates.rs: ## @@ -441,3 +776,1110 @@ async fn count_distinct_dictionary_mixed_values() -> Result<()> { Ok(()) } + +/// C

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
blaginin commented on PR #16466: URL: https://github.com/apache/datafusion/pull/16466#issuecomment-3059259592 overall looks good and thank you for working on that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] minor: Refactor to reduce duplicate serde code [datafusion-comet]

2025-07-10 Thread via GitHub
mbutrovich merged PR #2011: URL: https://github.com/apache/datafusion-comet/pull/2011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[PR] Fix in list round trip in df proto [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao opened a new pull request, #16744: URL: https://github.com/apache/datafusion/pull/16744 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16665. ## Rationale for this change ## What changes are included in

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2198853680 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [I] TPC-H Q16 fails during deserialization [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao commented on issue #16665: URL: https://github.com/apache/datafusion/issues/16665#issuecomment-3059272448 Filled a fix, but need to dig deeper to understand why it doesn't trigger the bug in df47 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove commented on PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#issuecomment-3059275876 @parthchandra @mbutrovich would you also like to review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[I] Add from_utc_timestamp support [datafusion-comet]

2025-07-10 Thread via GitHub
kazuyukitanimura opened a new issue, #2013: URL: https://github.com/apache/datafusion-comet/issues/2013 ### What is the problem the feature request solves? from_utc_timestamp is not supported now https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.

Re: [PR] [Draft]Add SQL logic tests for Run-End Encoded (REE) [datafusion]

2025-07-10 Thread via GitHub
rich-t-kid-datadog commented on PR #16715: URL: https://github.com/apache/datafusion/pull/16715#issuecomment-3059324594 TBD: add the `In` operator into the test, should fit in with the other string operations and its frequently used at DD -- This is an automated message from the Apache Gi

Re: [PR] Ci cache [datafusion]

2025-07-10 Thread via GitHub
blaginin commented on PR #16709: URL: https://github.com/apache/datafusion/pull/16709#issuecomment-3059328323 linux build test: 5m → 2m check substrait features: 11m → 9m cargo test (amd64): 18m → 16m cargo examples: 17m → 11m clippy: 6m → 5m really like linux build test im

[I] Add date_format support [datafusion-comet]

2025-07-10 Thread via GitHub
kazuyukitanimura opened a new issue, #2014: URL: https://github.com/apache/datafusion-comet/issues/2014 ### What is the problem the feature request solves? date_format is not supported https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.d

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-10 Thread via GitHub
2010YOUY01 commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2199423074 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -828,13 +833,127 @@ impl NestedLoopJoinStream { handle_state!(self.process

Re: [I] Add date_format support [datafusion-comet]

2025-07-10 Thread via GitHub
jatin510 commented on issue #2014: URL: https://github.com/apache/datafusion-comet/issues/2014#issuecomment-3060316382 i would love to work on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-10 Thread via GitHub
2010YOUY01 commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2199484385 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices( probe_indices: UInt32Array, filte

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-10 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2198802030 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [I] TPC-H Q16 fails during deserialization [datafusion]

2025-07-10 Thread via GitHub
XiangpengHao commented on issue #16665: URL: https://github.com/apache/datafusion/issues/16665#issuecomment-3059200639 Taking a look as LiquidCache also run into this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-10 Thread via GitHub
blaginin commented on code in PR #16466: URL: https://github.com/apache/datafusion/pull/16466#discussion_r2198824772 ## datafusion/core/tests/sql/aggregates.rs: ## @@ -441,3 +776,1110 @@ async fn count_distinct_dictionary_mixed_values() -> Result<()> { Ok(()) } + +/// C

Re: [PR] feat: Add JNI-based Hadoop FileSystem support for S3 and other Hadoop-compatible stores [datafusion-comet]

2025-07-10 Thread via GitHub
parthchandra commented on PR #1992: URL: https://github.com/apache/datafusion-comet/pull/1992#issuecomment-3059721586 Update on this: 1) fs-hdfs includes its [own version of libhdfs](https://github.com/datafusion-contrib/fs-hdfs/tree/hadoop-3.1.4/c_src/libhdfs) as part of its build so

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-10 Thread via GitHub
adamreeve commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3059787257 This approach looks good to me! Do the existing tests hit the parallel write code path? We probably want to make sure there are encryption tests for both the parallel and serial co

[PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-10 Thread via GitHub
NGA-TRAN opened a new pull request, #16742: URL: https://github.com/apache/datafusion/pull/16742 ## Which issue does this PR close? This are tests per @alamb's suggestion at https://github.com/apache/datafusion/pull/16662#pullrequestreview-2994410907: `I also wonder if we s

[PR] Benchmark for char expression [datafusion]

2025-07-10 Thread via GitHub
ajita-asthana opened a new pull request, #16743: URL: https://github.com/apache/datafusion/pull/16743 ## Which issue does this PR close? - Closes #16009 ## Rationale for this change Add a benchmark to measure char performance ## What changes are included in this P

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-10 Thread via GitHub
corwinjoy commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3059890834 This pull request introduces several updates across multiple components of the codebase, focusing on dependency management, feature enhancements, and code cleanup. The most signifi

Re: [I] Optimized spill file format [datafusion]

2025-07-10 Thread via GitHub
ding-young commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-3059889330 > So shall we close this issue as complete now? Yes, I think so. Of course, there's still room to seek further performance optimizations, but for now: - Validati

Re: [I] Filtering and counting afterwards causes overflow panic in interval arithmetics [datafusion]

2025-07-10 Thread via GitHub
liamzwbao commented on issue #16736: URL: https://github.com/apache/datafusion/issues/16736#issuecomment-3059941385 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-10 Thread via GitHub
NGA-TRAN commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2198593774 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<(

[PR] Fix invalid intervals in `satisfy_greater` [datafusion]

2025-07-10 Thread via GitHub
liamzwbao opened a new pull request, #16745: URL: https://github.com/apache/datafusion/pull/16745 ## Which issue does this PR close? - Closes #16736. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] feat: Add a configuration to make parquet encryption optional [datafusion]

2025-07-10 Thread via GitHub
corwinjoy commented on code in PR #16649: URL: https://github.com/apache/datafusion/pull/16649#discussion_r2196418502 ## datafusion/core/Cargo.toml: ## @@ -61,13 +61,21 @@ default = [ "unicode_expressions", "compression", "parquet", +"parquet_encryption", Rev

Re: [PR] feat: Add a configuration to make parquet encryption optional [datafusion]

2025-07-10 Thread via GitHub
corwinjoy commented on code in PR #16649: URL: https://github.com/apache/datafusion/pull/16649#discussion_r2198712011 ## datafusion/common/src/encryption.rs: ## @@ -0,0 +1,76 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

[PR] Refactor shuffle supported [datafusion-comet]

2025-07-10 Thread via GitHub
andygrove opened a new pull request, #2015: URL: https://github.com/apache/datafusion-comet/pull/2015 ## Which issue does this PR close? N/A ## Rationale for this change Refactor to move code where it belongs. ## What changes are included in this PR

  1   2   >