Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144201354 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144201354 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2969058654 > however @UBarney was able to point out that a on clause already included in the join filter 🤦. I mean that non-equal condition(eg `<=>`) in `on` will be included in the joi

Re: [I] Panic in FFI UDWF when using wrapping lead function [datafusion-python]

2025-06-12 Thread via GitHub
kosiew commented on issue #1144: URL: https://github.com/apache/datafusion-python/issues/1144#issuecomment-2969068899 hi @timsaucer I can see some errors when I pytest _test_window_udf.py in #1145 but I want to be sure it is the same error as you are reporting in this issue. Ca

Re: [I] RFC: What 3 level naming system should we use for catalog providers? [datafusion-python]

2025-06-12 Thread via GitHub
kosiew commented on issue #1142: URL: https://github.com/apache/datafusion-python/issues/1142#issuecomment-2969114420 ## Context & Problem Statement - **Current state** - **Datafusion core repo:** uses `catalog/schema/table` - **Datafusion Python repo:** uses `ca

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144201354 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n closed pull request #16390: fix: Remove `null_equals_null` todo in `NestedLoopJoin` URL: https://github.com/apache/datafusion/pull/16390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2968831295 > Made some progress on the problem statement already. I gave the AI the facts, it turned it into something I would actually enjoy reading. I'm going to work on the way thin

Re: [PR] Minor: add testing case for add YieldStreamExec and polish docs [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on code in PR #16369: URL: https://github.com/apache/datafusion/pull/16369#discussion_r2144101877 ## datafusion/physical-optimizer/Cargo.toml: ## @@ -49,6 +49,7 @@ datafusion-physical-plan = { workspace = true } itertools = { workspace = true } log = { wo

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144121144 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
kosiew commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2968958912 hi @drtconway , pmin sounds a lot like https://datafusion.apache.org/user-guide/sql/scalar_functions.html#least and pmax like https://datafusion.apache.org/use

[PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n opened a new pull request, #16390: URL: https://github.com/apache/datafusion/pull/16390 ## Which issue does this PR close? - Closes #. ## Rationale for this change I had created #16210 to add `null_equals_null` join support however @UBarney was able to po

Re: [I] Panic in FFI UDWF when using wrapping lead function [datafusion-python]

2025-06-12 Thread via GitHub
kosiew commented on issue #1144: URL: https://github.com/apache/datafusion-python/issues/1144#issuecomment-2969025577 hi @timsaucer , Can you share the `examples/datafusion-ffi-example/src/window_udf.rs`? I don't believe it's in `main` yet. -- This is an automated message from t

Re: [I] Panic in FFI UDWF when using wrapping lead function [datafusion-python]

2025-06-12 Thread via GitHub
kosiew commented on issue #1144: URL: https://github.com/apache/datafusion-python/issues/1144#issuecomment-2969030143 Never mind, I got the file from #1145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144201354 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144133973 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144121144 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144167426 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144169706 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
kosiew commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2968970691 ```sql DataFusion CLI v48.0.0 > -- Define sample data CREATE TABLE t1 (a INT, b INT, c INT) AS VALUES (4, NULL, NULL), (1, 2, 3), (3, 1, 2), (1, NULL,

Re: [PR] POC: Reduce `Arc` cloning on hashmap build side [datafusion]

2025-06-12 Thread via GitHub
Dandandan commented on PR #16380: URL: https://github.com/apache/datafusion/pull/16380#issuecomment-2969139729 > I've noticed that it is possible for `interleave` to perform worse than `take` despite the `Arc` clones from `take`. This happens twice as well for `equal_row_arr` and `build_bat

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
drtconway commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2969144940 Ah yes! Thank you! But they're not in the Rust part of the documentation (https://datafusion.apache.org/user-guide/expressions.html), which is why I didn't know they we

Re: [I] Optimize `NestedLoopJoinExec` Memory Usage [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on issue #16364: URL: https://github.com/apache/datafusion/issues/16364#issuecomment-2968778612 > limiting the the intermediate result to ~1 batch size is enough to keep the performance. Do you mean we should also limit num_row of [`left_side, right_side`](http

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144282145 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144282145 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
drtconway commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2969217442 Sure! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
drtconway closed issue #16366: row-wise min and max URL: https://github.com/apache/datafusion/issues/16366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
kosiew commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2969200984 @drtconway You're welcome. Can you close this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Enable merge queue in github to avoid commit confliction. [datafusion]

2025-06-12 Thread via GitHub
crepererum commented on issue #6880: URL: https://github.com/apache/datafusion/issues/6880#issuecomment-2966444088 > Based on ASF Slack, I believe MQ aren't currently supported in `.asf.yaml` because there's no API support ([github.com/orgs/community/discussions/50893](https://github.com/or

[PR] feat: Upgrade to official DataFusion 48.0.0 release [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove opened a new pull request, #1877: URL: https://github.com/apache/datafusion-comet/pull/1877 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-12 Thread via GitHub
alamb commented on issue #16383: URL: https://github.com/apache/datafusion/issues/16383#issuecomment-2966455712 I manually updated the cargo file (applied the patch above) and ran `cargo publish` to get this to publish at 48.0.0: - https://crates.io/crates/datafusion-spark/48.0.0

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
alamb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2966448392 Nice @zhuqi-lucas -- BTW I am not sure how easy it will be to use the parquet APIs to do this (specifically write arbitrary bytes to the inner writer) so it may take some fiddlin

[PR] feat: Add support for glob patterns in CREATE EXTERNAL TABLE commands [datafusion]

2025-06-12 Thread via GitHub
a-agmon opened a new pull request, #16387: URL: https://github.com/apache/datafusion/pull/16387 Partly closes #16303 The purpose of this PR is to enable using CREATE command with glob pattern and a URL scheme - i.e., ``` CREATE EXTERNAL TABLE ee3 STORED AS CSV LOCATION

Re: [PR] Minor: add testing case for add YieldStreamExec and polish docs [datafusion]

2025-06-12 Thread via GitHub
alamb commented on code in PR #16369: URL: https://github.com/apache/datafusion/pull/16369#discussion_r2142595175 ## datafusion/physical-optimizer/src/insert_yield_exec.rs: ## @@ -32,9 +34,10 @@ use datafusion_physical_plan::yield_stream::YieldStreamExec; use datafusion_physica

Re: [PR] chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on code in PR #1870: URL: https://github.com/apache/datafusion-comet/pull/1870#discussion_r2142666410 ## .github/workflows/pr_build_linux.yml: ## @@ -74,14 +74,14 @@ jobs: maven_opts: "-Pspark-3.4 -Pscala-2.12" scan_impl: "native_com

Re: [PR] chore: Enable Spark SQL tests for `native_iceberg_compat` [datafusion-comet]

2025-06-12 Thread via GitHub
codecov-commenter commented on PR #1876: URL: https://github.com/apache/datafusion-comet/pull/1876#issuecomment-2966738000 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1876?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] chore: Enable Spark SQL tests for `native_iceberg_compat` [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove opened a new pull request, #1876: URL: https://github.com/apache/datafusion-comet/pull/1876 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1542 ## Rationale for this change Disable the one remaining f

Re: [I] [Spark SQL] Fix InsertSuite failure when using native_iceberg_compat with Spark 3.4.3 [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on issue #1875: URL: https://github.com/apache/datafusion-comet/issues/1875#issuecomment-2966675923 @mbutrovich Could you work on this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] [Spark SQL] Fix InsertSuite failure when using native_iceberg_compat with Spark 3.4.3 [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on issue #1875: URL: https://github.com/apache/datafusion-comet/issues/1875#issuecomment-2966682640 > [@mbutrovich](https://github.com/mbutrovich) Could you work on this one? Sounds good! -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] feat: Upgrade to official DataFusion 48.0.0 release [datafusion-comet]

2025-06-12 Thread via GitHub
codecov-commenter commented on PR #1877: URL: https://github.com/apache/datafusion-comet/pull/1877#issuecomment-2966837862 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1877?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-12 Thread via GitHub
pepijnve commented on PR #16322: URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2966838912 I don't think it's necessary TBH. I applied this patch (which I think is what @berkaysynnada meant) and the test then fails in the way it's intended to. ``` Index: datafusi

Re: [PR] Simplify predicates in filter [datafusion]

2025-06-12 Thread via GitHub
xudong963 commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2142234067 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -778,6 +779,16 @@ impl OptimizerRule for PushDownFilter { return Ok(Transformed::no(plan));

Re: [PR] feat: Support tpch and tpch10 benchmark for csv format [datafusion]

2025-06-12 Thread via GitHub
alamb commented on PR #16373: URL: https://github.com/apache/datafusion/pull/16373#issuecomment-2965078477 Nice -- thank you @zhuqi-lucas and @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[I] Add support for snowflake cluster by expressions [datafusion-sqlparser-rs]

2025-06-12 Thread via GitHub
osipovartem opened a new issue, #1882: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1882 ```sql CLUSTER BY (to_date(ts), coalesce(foo, 'bar')) ``` Failed with `ParserError("Expected: ), found: (` https://docs.snowflake.com/en/user-guide/tables-clustering-keys#def

Re: [PR] feat: Support tpch and tpch10 benchmark for csv format [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on PR #16373: URL: https://github.com/apache/datafusion/pull/16373#issuecomment-2965000675 Thank you @2010YOUY01 for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [datafusion-spark] Example of using Spark compatible function library [datafusion]

2025-06-12 Thread via GitHub
alamb opened a new pull request, #16384: URL: https://github.com/apache/datafusion/pull/16384 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/15915 - Closes https://github.com/apache/datafusion/issues/16383 ## Rationale for this

Re: [PR] refactor(joins::utils): Replace OnceAsync/OnceFut with tokio's OnceCell [datafusion]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #15431: URL: https://github.com/apache/datafusion/pull/15431#issuecomment-2964785636 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] chore(deps): bump object_store from 0.12.1 to 0.12.2 [datafusion]

2025-06-12 Thread via GitHub
xudong963 merged PR #16368: URL: https://github.com/apache/datafusion/pull/16368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Snowflake: ALTER ICEBERG TABLE [datafusion-sqlparser-rs]

2025-06-12 Thread via GitHub
osipovartem closed issue #1868: Snowflake: ALTER ICEBERG TABLE URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Chore: implement hour func as ScalarUDFImpl [datafusion-comet]

2025-06-12 Thread via GitHub
trompa opened a new pull request, #1874: URL: https://github.com/apache/datafusion-comet/pull/1874 ## Which issue does this PR close? Part of #1819 ## Rationale for this change Part of #1819 ## What changes are included in this PR? Implement hour as Scalar

Re: [PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-12 Thread via GitHub
pepijnve commented on PR #16322: URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965650342 > if the pending rotation somehow breaks, since SortPreservingMergeStream never yields I'm not sure I understand what you mean @berkaysynnada. Looking at just the initial pha

Re: [I] Support RightMark join for `SortMergeJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n closed issue #16226: Support RightMark join for `SortMergeJoin` URL: https://github.com/apache/datafusion/issues/16226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Simplify expressions passed to table functions [datafusion]

2025-06-12 Thread via GitHub
simonvandel opened a new pull request, #16388: URL: https://github.com/apache/datafusion/pull/16388 ## Which issue does this PR close? Fixes https://github.com/apache/datafusion/issues/14958 ## Rationale for this change Table functions don't need to special case `

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-12 Thread via GitHub
xudong963 commented on issue #16383: URL: https://github.com/apache/datafusion/issues/16383#issuecomment-2967031389 Nice, just published datafusion-sqllogictest by fixing the files locally -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-12 Thread via GitHub
alamb commented on issue #16383: URL: https://github.com/apache/datafusion/issues/16383#issuecomment-2967050602 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-12 Thread via GitHub
alamb closed issue #16383: Can't publish datafusion-spark crate due to error URL: https://github.com/apache/datafusion/issues/16383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-12 Thread via GitHub
kevinjqliu commented on code in PR #74: URL: https://github.com/apache/datafusion-site/pull/74#discussion_r2143107153 ## content/blog/2025-06-15-optimizing-sql-dataframes-part-one.md: ## @@ -0,0 +1,250 @@ +--- +layout: post +title: Optimizing SQL (and DataFrames) in DataFusion,

[PR] feat: Implement doCanonicilize for CometShuffleExchangeExec [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove opened a new pull request, #1878: URL: https://github.com/apache/datafusion-comet/pull/1878 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] [Spark SQL] Fix InsertSuite failure when using native_iceberg_compat with Spark 3.4.3 [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on issue #1875: URL: https://github.com/apache/datafusion-comet/issues/1875#issuecomment-2967624631 Spark 3.4 and 3.5 handle struct conversion in this test case differently. 3.4 inserts a `cast` expression in the Project operator, while 3.5 used a `named_struct` expres

Re: [PR] fix: cast_struct_to_struct aligns to Spark behavior [datafusion-comet]

2025-06-12 Thread via GitHub
codecov-commenter commented on PR #1879: URL: https://github.com/apache/datafusion-comet/pull/1879#issuecomment-2967951510 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1879?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
adriangb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2967987004 Very excited about this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Utf8View and BinaryView (i.e., StringView in Arrow, colloquially German-style strings) support [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on issue #1403: URL: https://github.com/apache/datafusion-comet/issues/1403#issuecomment-2967992880 I'll be picking this up again now that we dropped Java 8 support and bumped our Arrow Java version. -- This is an automated message from the Apache Git Service. To res

Re: [PR] doc: Add SQL examples for SEMI + ANTI Joins [datafusion]

2025-06-12 Thread via GitHub
alamb merged PR #16316: URL: https://github.com/apache/datafusion/pull/16316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Document semi join, anti semi join and more supported join types [datafusion]

2025-06-12 Thread via GitHub
alamb closed issue #16245: Document semi join, anti semi join and more supported join types URL: https://github.com/apache/datafusion/issues/16245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-12 Thread via GitHub
corwinjoy commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2968354595 @alamb One piece I would like to solicit feedback on is if there is a way to leverage the existing tests to more thoroughly vet encryption. What I mean by that, is that we uncovere

Re: [PR] feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-06-12 Thread via GitHub
parthchandra commented on code in PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#discussion_r2143805849 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -105,8 +105,49 @@ case class CometScanRule(session: SparkSession) extends Rule[

Re: [PR] feat: Implement `doCanonicalize` for `CometShuffleExchangeExec` [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove closed pull request #1878: feat: Implement `doCanonicalize` for `CometShuffleExchangeExec` URL: https://github.com/apache/datafusion-comet/pull/1878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [Spark SQL] Enable all tests in DynamicPartitionPruningSuite [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on issue #1839: URL: https://github.com/apache/datafusion-comet/issues/1839#issuecomment-2967483655 All of these tests will need to be ignored until we support DPP (cc @coderfender). This is not a correctness issue but a performance issue due to not reusing exchanges.

Re: [I] [Spark SQL] Enable all tests in DynamicPartitionPruningSuite [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on issue #1839: URL: https://github.com/apache/datafusion-comet/issues/1839#issuecomment-296748 The "canocilization and exchange reuse" test is expected to fail and should be ignored until Comet supports DPP. The two exchanges are different. One contains a `CometSca

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-12 Thread via GitHub
kevinjqliu commented on PR #74: URL: https://github.com/apache/datafusion-site/pull/74#issuecomment-2967429852 also nit, any links with `https://github.com/apache/datafusion/blob/main/` runs into the risk of being stale at a later time. For example, if a file path was moved to a different l

Re: [I] Spark SQL test failures in native_iceberg_compat mode [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove closed issue #1542: Spark SQL test failures in native_iceberg_compat mode URL: https://github.com/apache/datafusion-comet/issues/1542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore: Enable Spark SQL tests for `native_iceberg_compat` [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove merged PR #1876: URL: https://github.com/apache/datafusion-comet/pull/1876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] fix: cast_struct_to_struct aligns to Spark behavior [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich opened a new pull request, #1879: URL: https://github.com/apache/datafusion-comet/pull/1879 ## Which issue does this PR close? Closes #1875. ## Rationale for this change ## What changes are included in this PR? - `cast_struct_to_s

Re: [PR] feat: Implement `doCanonicalize` for `CometShuffleExchangeExec` [datafusion-comet]

2025-06-12 Thread via GitHub
codecov-commenter commented on PR #1878: URL: https://github.com/apache/datafusion-comet/pull/1878#issuecomment-2967573528 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1878?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2143227797 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143240105 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-12 Thread via GitHub
kevinjqliu commented on code in PR #74: URL: https://github.com/apache/datafusion-site/pull/74#discussion_r2143137504 ## content/blog/2025-06-15-optimizing-sql-dataframes-part-two.md: ## @@ -0,0 +1,533 @@ +--- +layout: post +title: Optimizing SQL (and DataFrames) in DataFusion,

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [I] Add support for SMJ with RightSemi join [datafusion-comet]

2025-06-12 Thread via GitHub
dharanad commented on issue #1725: URL: https://github.com/apache/datafusion-comet/issues/1725#issuecomment-2967896852 @andygrove Unlike Datafusion, Spark does not natively support RightSemi join type. This presents a challenge, and I was hoping to get your thoughts on the best way to han

Re: [PR] chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove closed pull request #1870: chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 URL: https://github.com/apache/datafusion-comet/pull/1870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on PR #1870: URL: https://github.com/apache/datafusion-comet/pull/1870#issuecomment-2968083119 These changes already got merged in https://github.com/apache/datafusion-comet/pull/1869 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] Fix failed Spark SQL tests due to shuffle enabled [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove closed issue #231: Fix failed Spark SQL tests due to shuffle enabled URL: https://github.com/apache/datafusion-comet/issues/231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Add fast paths for try_process_unnest [datafusion]

2025-06-12 Thread via GitHub
simonvandel opened a new pull request, #16389: URL: https://github.com/apache/datafusion/pull/16389 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16242 ## Rationale for this change Reduce planning work for unnest e

Re: [PR] chore: Enable more Spark SQL tests [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove merged PR #1869: URL: https://github.com/apache/datafusion-comet/pull/1869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on code in PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#discussion_r2143610758 ## .github/workflows/spark_sql_test_native_auto.yml: ## @@ -0,0 +1,71 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] Chore: implement hour func as ScalarUDFImpl [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on PR #1874: URL: https://github.com/apache/datafusion-comet/pull/1874#issuecomment-2968155976 This is looking good so far! I know it's not your change, but it made me wonder: do you know if we have a test that exercises this code path? `"Hour(scalar) should be fold in

Re: [PR] Chore: implement predicate exprs as ScalarUDFImpl [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on code in PR #1864: URL: https://github.com/apache/datafusion-comet/pull/1864#discussion_r2143694940 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -952,32 +947,23 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Chore: implement predicate exprs as ScalarUDFImpl [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on code in PR #1864: URL: https://github.com/apache/datafusion-comet/pull/1864#discussion_r2143696076 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -952,32 +947,23 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [I] `datafusion-cli`: Use correct S3 region if it is not specified [datafusion]

2025-06-12 Thread via GitHub
liamzwbao commented on issue #16306: URL: https://github.com/apache/datafusion/issues/16306#issuecomment-296858 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] `datafusion-cli`: Use correct S3 region if it is not specified [datafusion]

2025-06-12 Thread via GitHub
liamzwbao commented on issue #16306: URL: https://github.com/apache/datafusion/issues/16306#issuecomment-2968741593 Hi @alamb, from the upstream ticket, I think we can use `resolve_bucket_region` to get the region if it's not specified. However, I'm wondering what should be the expect

Re: [PR] Use pager and allow configuration via `\pset` [datafusion]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #15597: URL: https://github.com/apache/datafusion/pull/15597#issuecomment-2968768204 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Simplify predicates in filter [datafusion]

2025-06-12 Thread via GitHub
xudong963 commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2142084110 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -778,6 +779,16 @@ impl OptimizerRule for PushDownFilter { return Ok(Transformed::no(plan));

Re: [I] Add documentation on constraint enforcements [datafusion]

2025-06-12 Thread via GitHub
alamb closed issue #16309: Add documentation on constraint enforcements URL: https://github.com/apache/datafusion/issues/16309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] POC: Reduce `Arc` cloning on hashmap build side [datafusion]

2025-06-12 Thread via GitHub
Dandandan commented on code in PR #16380: URL: https://github.com/apache/datafusion/pull/16380#discussion_r2142058321 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -95,9 +96,11 @@ struct JoinLeftData { /// The hash table with indices into `batch` hash_map:

Re: [PR] chore: refactor Substrait consumer's "rename_field" and implement the rest of types [datafusion]

2025-06-12 Thread via GitHub
gabotechs commented on code in PR #16345: URL: https://github.com/apache/datafusion/pull/16345#discussion_r2141784191 ## datafusion/substrait/src/logical_plan/consumer/utils.rs: ## @@ -81,98 +81,167 @@ pub(super) fn next_struct_field_name( } } -pub(super) fn rename_field

Re: [PR] chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 [datafusion-comet]

2025-06-12 Thread via GitHub
parthchandra commented on code in PR #1870: URL: https://github.com/apache/datafusion-comet/pull/1870#discussion_r2141326158 ## .github/workflows/pr_build_linux.yml: ## @@ -74,14 +74,14 @@ jobs: maven_opts: "-Pspark-3.4 -Pscala-2.12" scan_impl: "native_

Re: [PR] chore: refactor Substrait consumer's "rename_field" and implement the rest of types [datafusion]

2025-06-12 Thread via GitHub
alamb merged PR #16345: URL: https://github.com/apache/datafusion/pull/16345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-12 Thread via GitHub
berkaysynnada commented on PR #16322: URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965238217 Sorry for the late reply. @pepijnve Your diagnosis is spot on, and the proposed fix totally makes sense. I honestly can’t recall why I added the wake there but not in Congested

  1   2   >