Re: [PR] fix: Preserve equivalence properties of input plan in unnest [datafusion]

2025-07-31 Thread via GitHub
vegarsti commented on code in PR #16985: URL: https://github.com/apache/datafusion/pull/16985#discussion_r2245319362 ## datafusion/physical-plan/src/unnest.rs: ## @@ -101,8 +101,22 @@ impl UnnestExec { input: &Arc, schema: SchemaRef, ) -> PlanProperties {

Re: [PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-07-31 Thread via GitHub
2010YOUY01 commented on code in PR #16996: URL: https://github.com/apache/datafusion/pull/16996#discussion_r2245710943 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4127,9 +4127,9 @@ logical_plan 03)TableScan: left_table projection=[a, b, c] 04)TableScan: rig

Re: [PR] WIP: Rewrite NestedLoopJoin to limit intermediate size (up to 3.2X faster) [datafusion]

2025-07-31 Thread via GitHub
2010YOUY01 closed pull request #16889: WIP: Rewrite NestedLoopJoin to limit intermediate size (up to 3.2X faster) URL: https://github.com/apache/datafusion/pull/16889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] WIP: Rewrite NestedLoopJoin to limit intermediate size (up to 3.2X faster) [datafusion]

2025-07-31 Thread via GitHub
2010YOUY01 commented on PR #16889: URL: https://github.com/apache/datafusion/pull/16889#issuecomment-3140413077 Move to the final PR https://github.com/apache/datafusion/pull/16996, so closing this one. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Chore: Improve array contains test coverage [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove merged PR #2030: URL: https://github.com/apache/datafusion-comet/pull/2030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] chore: Add scripts for running benchmarks with Blaze [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove opened a new pull request, #2050: URL: https://github.com/apache/datafusion-comet/pull/2050 ## Which issue does this PR close? N/A ## Rationale for this change There is a proposal to move the Blaze project into the ASF as a top-level incubating

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-31 Thread via GitHub
coderfender commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2245849616 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -212,17 +211,26 @@ object CometIntegralDivide extends CometExpressionSerde with Mat

[I] FFI: data corruption on boolean Array with non-zero offset [datafusion-comet]

2025-07-31 Thread via GitHub
mbutrovich opened a new issue, #2051: URL: https://github.com/apache/datafusion-comet/issues/2051 ### Describe the bug ### Background We encountered a [new issue](https://github.com/apache/datafusion-comet/pull/2040#issuecomment-3111563959) while trying to upgrade to DF49 where a

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2245874649 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -212,17 +211,26 @@ object CometIntegralDivide extends CometExpressionSerde with MathB

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2245741631 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -113,6 +113,16 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2245743098 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -212,17 +211,26 @@ object CometIntegralDivide extends CometExpressionSerde with MathB

Re: [PR] fix: Preserve equivalence properties of input plan in unnest [datafusion]

2025-07-31 Thread via GitHub
asubiotto commented on code in PR #16985: URL: https://github.com/apache/datafusion/pull/16985#discussion_r2245279416 ## datafusion/physical-plan/src/unnest.rs: ## @@ -101,8 +101,22 @@ impl UnnestExec { input: &Arc, schema: SchemaRef, ) -> PlanProperties {

Re: [PR] fix: Preserve equivalence properties of input plan in unnest [datafusion]

2025-07-31 Thread via GitHub
vegarsti commented on code in PR #16985: URL: https://github.com/apache/datafusion/pull/16985#discussion_r2245268389 ## datafusion/physical-plan/src/unnest.rs: ## @@ -101,8 +101,22 @@ impl UnnestExec { input: &Arc, schema: SchemaRef, ) -> PlanProperties {

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3139646399 > > > 🤖: Benchmark completed > > > Details > > > > > > I think this doesn't show anything as it's not enabled by default? Should we enable it? > > I made a PR to te

Re: [PR] POC (testing) -- test with Parquet Metadata Cache enabled [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16988: URL: https://github.com/apache/datafusion/pull/16988#issuecomment-3139647249 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] Add `sql_parser.default_null_ordering` config option to customize the default null ordering [datafusion]

2025-07-31 Thread via GitHub
goldmedal commented on PR #16963: URL: https://github.com/apache/datafusion/pull/16963#issuecomment-3139868201 Thanks @alamb for reviewing 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat(spark): implement Spark math function bit_get/bit_count [datafusion]

2025-07-31 Thread via GitHub
andygrove commented on code in PR #16942: URL: https://github.com/apache/datafusion/pull/16942#discussion_r2245810757 ## datafusion/spark/src/function/bitwise/bit_get.rs: ## @@ -0,0 +1,291 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-07-31 Thread via GitHub
comphead commented on code in PR #16970: URL: https://github.com/apache/datafusion/pull/16970#discussion_r2245812112 ## datafusion/optimizer/src/optimizer.rs: ## @@ -137,13 +137,15 @@ impl OptimizerContext { Self { query_execution_start_time: Utc::now(),

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-07-31 Thread via GitHub
Omega359 commented on code in PR #16970: URL: https://github.com/apache/datafusion/pull/16970#discussion_r2245809712 ## datafusion/core/src/execution/session_state.rs: ## @@ -738,6 +738,17 @@ impl SessionState { self.config.options() } +/// return the configu

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-07-31 Thread via GitHub
Omega359 commented on code in PR #16970: URL: https://github.com/apache/datafusion/pull/16970#discussion_r2245815418 ## datafusion/core/src/execution/session_state.rs: ## @@ -738,6 +738,17 @@ impl SessionState { self.config.options() } +/// return the configu

Re: [PR] fix: Fix `EquivalenceClass` calculation for Union queries [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16185: URL: https://github.com/apache/datafusion/pull/16185#issuecomment-3139484779 Performance tests show no obvious difference in performance -- I will plan to review this more carefully late today -- This is an automated message from the Apache Git Service. To re

Re: [PR] fix: Preserve equivalence properties of input plan in unnest [datafusion]

2025-07-31 Thread via GitHub
vegarsti commented on code in PR #16985: URL: https://github.com/apache/datafusion/pull/16985#discussion_r2245063342 ## datafusion/physical-plan/src/unnest.rs: ## @@ -101,8 +101,22 @@ impl UnnestExec { input: &Arc, schema: SchemaRef, ) -> PlanProperties {

Re: [PR] chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove commented on PR #2026: URL: https://github.com/apache/datafusion-comet/pull/2026#issuecomment-3140485342 @petern48 I don't know if you saw, but there are test failures that need investigating -- This is an automated message from the Apache Git Service. To respond to the message

[I] `AccumulatorArgs.schema` is empty when passing in scalar input [datafusion]

2025-07-31 Thread via GitHub
kylebarron opened a new issue, #16997: URL: https://github.com/apache/datafusion/issues/16997 ### Describe the bug I'm looking for a way to pass down the input field into my UDAF, as I need the field metadata to access the Arrow extension metadata. It looks like [`AccumulatorArgs.sch

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-07-31 Thread via GitHub
comphead commented on code in PR #16970: URL: https://github.com/apache/datafusion/pull/16970#discussion_r2245799840 ## datafusion/core/src/execution/session_state.rs: ## @@ -738,6 +738,17 @@ impl SessionState { self.config.options() } +/// return the configu

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-07-31 Thread via GitHub
comphead commented on code in PR #16970: URL: https://github.com/apache/datafusion/pull/16970#discussion_r2245801709 ## datafusion/core/src/execution/session_state.rs: ## @@ -738,6 +738,17 @@ impl SessionState { self.config.options() } +/// return the configu

Re: [PR] chore(deps): bump indicatif from 0.17.11 to 0.18.0 [datafusion]

2025-07-31 Thread via GitHub
xudong963 merged PR #16992: URL: https://github.com/apache/datafusion/pull/16992 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-07-31 Thread via GitHub
shehabgamin commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3139677123 This is super exciting! 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Implementing partition_statistics for EmptyExec (Issue #15873) [datafusion]

2025-07-31 Thread via GitHub
xudong963 commented on code in PR #16941: URL: https://github.com/apache/datafusion/pull/16941#discussion_r2245170771 ## datafusion/physical-plan/src/empty.rs: ## @@ -229,4 +243,62 @@ mod tests { assert!(empty.execute(20, task_ctx).is_err()); Ok(()) } + +

Re: [PR] Add `sql_parser.default_null_ordering` config option to customize the default null ordering [datafusion]

2025-07-31 Thread via GitHub
goldmedal merged PR #16963: URL: https://github.com/apache/datafusion/pull/16963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] `CooperativeExec` incorrectly implements `maintains_input_order` [datafusion]

2025-07-31 Thread via GitHub
pepijnve commented on issue #16994: URL: https://github.com/apache/datafusion/issues/16994#issuecomment-3139984941 I came up with this as unit test, but that pretty much duplicates the implementation. Should I add this test case or is it not worth it? ``` #[test] fn maintains_in

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-31 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2245591958 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -943,10 +978,71 @@ impl ExecutionPlan for HashJoinExec { try_embed_projection(projection, s

Re: [PR] fix: Preserve equivalence properties of input plan in unnest [datafusion]

2025-07-31 Thread via GitHub
vegarsti commented on code in PR #16985: URL: https://github.com/apache/datafusion/pull/16985#discussion_r2245268389 ## datafusion/physical-plan/src/unnest.rs: ## @@ -101,8 +101,22 @@ impl UnnestExec { input: &Arc, schema: SchemaRef, ) -> PlanProperties {

Re: [PR] Chore: Improve array contains test coverage [datafusion-comet]

2025-07-31 Thread via GitHub
kazantsev-maksim commented on PR #2030: URL: https://github.com/apache/datafusion-comet/pull/2030#issuecomment-3140349961 @comphead @andygrove thanks for the feedback! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-07-31 Thread via GitHub
2010YOUY01 opened a new pull request, #16996: URL: https://github.com/apache/datafusion/pull/16996 ## Which issue does this PR close? - Closes #. ## Rationale for this change # Summary This PR rewrites the NLJ operator from scratch with a different approach

Re: [PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-07-31 Thread via GitHub
2010YOUY01 commented on PR #16996: URL: https://github.com/apache/datafusion/pull/16996#issuecomment-3140384776 Draft PR and early discussions: https://github.com/apache/datafusion/pull/16889 Thanks @UBarney @ding-young and @jonathanc-n for the help 🙏🏼 -- This is an automated me

Re: [PR] Benchmark: Add micro-benchmark for Nested Loop Join operator [datafusion]

2025-07-31 Thread via GitHub
2010YOUY01 closed pull request #16819: Benchmark: Add micro-benchmark for Nested Loop Join operator URL: https://github.com/apache/datafusion/pull/16819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Benchmark: Add micro-benchmark for Nested Loop Join operator [datafusion]

2025-07-31 Thread via GitHub
2010YOUY01 commented on PR #16819: URL: https://github.com/apache/datafusion/pull/16819#issuecomment-3140388598 Move to the final PR https://github.com/apache/datafusion/pull/16996, so closing this one. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16977: URL: https://github.com/apache/datafusion/pull/16977#issuecomment-3139657549 > I am a bit confused about the regressions. Are these (and the improvements) within the margin of error on the benchmark runner? Otherwise, it could also be due to the `ahash` changes

[I] INSERT from SELECT cannot be parsed with an ON CONFLICT clause [datafusion-sqlparser-rs]

2025-07-31 Thread via GitHub
rtimush opened a new issue, #1987: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1987 Parsing the following SQL ```sql INSERT INTO foo SELECT 1, 'data' ON CONFLICT DO NOTHING ``` fails with ``` Expected: end of statement, found: CONFLICT at Line: 3,

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16977: URL: https://github.com/apache/datafusion/pull/16977#issuecomment-3139743553 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

Re: [PR] POC (testing) -- test with Parquet Metadata Cache enabled [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16988: URL: https://github.com/apache/datafusion/pull/16988#issuecomment-3139743382 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_default_to_on Benchmark clickbench_extended.json

[PR] #16994 Ensure CooperativeExec#maintains_input_order returns a Vec of the correct size [datafusion]

2025-07-31 Thread via GitHub
pepijnve opened a new pull request, #16995: URL: https://github.com/apache/datafusion/pull/16995 ## Which issue does this PR close? - Closes #16994. ## Rationale for this change The initial implementation of `CooperativeExec#maintains_input_order` is not correct and is l

Re: [PR] #16994 Ensure CooperativeExec#maintains_input_order returns a Vec of the correct size [datafusion]

2025-07-31 Thread via GitHub
pepijnve commented on PR #16995: URL: https://github.com/apache/datafusion/pull/16995#issuecomment-3139954754 @alamb this one might require a 49.0.1 😬 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add tests for yielding in `SpillManager::read_spill_as_stream` [datafusion]

2025-07-31 Thread via GitHub
pepijnve commented on PR #16616: URL: https://github.com/apache/datafusion/pull/16616#issuecomment-3139965024 > @pepijnve I wonder if you have some time to review this PR as well? @alamb IIRC, I provided feedback out of band which was integrated by @ding-young -- This is an automat

Re: [D] Join problems with custom TableProviders [datafusion]

2025-07-31 Thread via GitHub
GitHub user robo-todd added a comment to the discussion: Join problems with custom TableProviders Adding some details: This is with DataFusion 48 and 49 versions. Also for a join of this size (64 rows approximately?) I would not expect to see more than one scan of each incoming table. The cu

[PR] test(datafusion-cli): migrate tests to `insta` in `print_format.rs` [datafusion]

2025-07-31 Thread via GitHub
Thearas opened a new pull request, #16993: URL: https://github.com/apache/datafusion/pull/16993 ## Which issue does this PR close? Closes #15795 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16977: URL: https://github.com/apache/datafusion/pull/16977#issuecomment-3139940928 🤖: Benchmark completed Details ``` group feature_improve-expr-hash-perf main -

[I] `CooperativeExec` incorrectly implements `maintains_input_order` [datafusion]

2025-07-31 Thread via GitHub
pepijnve opened a new issue, #16994: URL: https://github.com/apache/datafusion/issues/16994 ### Describe the bug `CooperativeExec` implements `maintains_input_order` by delegating to its input. As a consequence it may return a `Vec` with a length that is smaller or larger than the nu

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-31 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2245195106 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -1039,12 +1196,50 @@ async fn collect_left_input( let data = JoinLeftData::new( hashmap,

Re: [PR] Added Example for `Statistical Functions` in Docs [datafusion]

2025-07-31 Thread via GitHub
Adez017 commented on PR #16927: URL: https://github.com/apache/datafusion/pull/16927#issuecomment-3140073922 Hi @alamb , @xudong963 . As of now its look good to me , if any further updates needed let me know . if not , we can move forward to merge it -- This is an automated message from

Re: [PR] Implementing partition_statistics for EmptyExec (Issue #15873) [datafusion]

2025-07-31 Thread via GitHub
vim89 commented on code in PR #16941: URL: https://github.com/apache/datafusion/pull/16941#discussion_r2245505745 ## datafusion/physical-plan/src/empty.rs: ## @@ -229,4 +243,62 @@ mod tests { assert!(empty.execute(20, task_ctx).is_err()); Ok(()) } + +#

Re: [PR] fix: `ComposedPhysicalExtensionCodec` does not use the same codec as encoding when decoding [datafusion]

2025-07-31 Thread via GitHub
milenkovicm commented on code in PR #16986: URL: https://github.com/apache/datafusion/pull/16986#discussion_r2245510375 ## datafusion-examples/examples/composed_extension_codec.rs: ## @@ -233,6 +234,17 @@ impl PhysicalExtensionCodec for ChildPhysicalExtensionCodec { } }

Re: [PR] fix: `TrivialValueAccumulators` to ignore null states for `ignore nulls` [datafusion]

2025-07-31 Thread via GitHub
mbutrovich commented on PR #16918: URL: https://github.com/apache/datafusion/pull/16918#issuecomment-3140641647 I've documented my investigation with a new issue on the Comet repo: https://github.com/apache/datafusion-comet/issues/2051 -- This is an automated message from the Apache Git

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-31 Thread via GitHub
coderfender commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2245892858 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -212,17 +211,26 @@ object CometIntegralDivide extends CometExpressionSerde with Mat

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2245896914 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -212,17 +211,29 @@ object CometIntegralDivide extends CometExpressionSerde with MathB

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-31 Thread via GitHub
coderfender commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2245901228 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -212,17 +211,29 @@ object CometIntegralDivide extends CometExpressionSerde with Mat

Re: [I] Overflow happened on: `Long.MinValue div -1` [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove closed issue #1477: Overflow happened on: `Long.MinValue div -1` URL: https://github.com/apache/datafusion-comet/issues/1477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-31 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3140961969 I think I've addressed all of the feedback and rebased on main / changes broken out into other PRs. @Dandandan @xudong963 I've tagged you both for review. @alamb would you m

Re: [PR] fix: zero Arrow Array offset before sending across FFI [datafusion-comet]

2025-07-31 Thread via GitHub
comphead commented on code in PR #2052: URL: https://github.com/apache/datafusion-comet/pull/2052#discussion_r2246090245 ## native/core/src/execution/jni_api.rs: ## @@ -341,10 +341,29 @@ fn prepare_output( let mut i = 0; while i < results.len() { l

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove merged PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Standardize on import order in Rust code [datafusion-comet]

2025-07-31 Thread via GitHub
mbutrovich commented on issue #2053: URL: https://github.com/apache/datafusion-comet/issues/2053#issuecomment-3140953687 I don't have a strong opinion on the order, but happy if we have a way to enforce it to something consistent. -- This is an automated message from the Apache Git Servi

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-31 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3140948881 @alamb could I ask you to kick off some benchmarks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3141175283 🤖: Benchmark completed Details ``` Comparing HEAD and hash-join-pushdown Benchmark clickbench_extended.json

Re: [I] Support old syntax for `approx_percentile_cont` and `approx_percentile_cont_with_weight` [datafusion]

2025-07-31 Thread via GitHub
alamb commented on issue #16955: URL: https://github.com/apache/datafusion/issues/16955#issuecomment-3140910299 There was a fairly involved discussion with @vbarua and @Garamda and @jayzhan211 about the backwards compatibility issue here - https://github.com/apache/datafusion/pull/13511

Re: [PR] fix: zero Arrow Array offset before sending across FFI [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove commented on code in PR #2052: URL: https://github.com/apache/datafusion-comet/pull/2052#discussion_r2246069765 ## native/core/src/execution/jni_api.rs: ## @@ -341,10 +341,29 @@ fn prepare_output( let mut i = 0; while i < results.len() {

Re: [PR] fix: zero Arrow Array offset before sending across FFI [datafusion-comet]

2025-07-31 Thread via GitHub
andygrove commented on code in PR #2052: URL: https://github.com/apache/datafusion-comet/pull/2052#discussion_r2246070759 ## native/core/src/execution/jni_api.rs: ## @@ -341,10 +341,29 @@ fn prepare_output( let mut i = 0; while i < results.len() {

Re: [PR] fix: Preserve equivalence properties of input plan in unnest [datafusion]

2025-07-31 Thread via GitHub
vegarsti commented on code in PR #16985: URL: https://github.com/apache/datafusion/pull/16985#discussion_r2246073369 ## datafusion/physical-plan/src/unnest.rs: ## @@ -101,8 +101,22 @@ impl UnnestExec { input: &Arc, schema: SchemaRef, ) -> PlanProperties {

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16977: URL: https://github.com/apache/datafusion/pull/16977#issuecomment-3141100594 The planner benchmarks look good across the board except for ``` logical_select_all_from_1000 1.04 11.5±0.09ms? ?/sec1.00 11.0±0.04ms?

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-07-31 Thread via GitHub
alamb commented on code in PR #16971: URL: https://github.com/apache/datafusion/pull/16971#discussion_r2246479502 ## docs/source/user-guide/configs.md: ## @@ -60,6 +60,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus | datafusion.execution.

[I] [Parquet Metadata Cache]: Limit memory used [datafusion]

2025-07-31 Thread via GitHub
alamb opened a new issue, #17001: URL: https://github.com/apache/datafusion/issues/17001 ### Is your feature request related to a problem or challenge? @nuno-faria implemented the core Parquet Metadata caching logic in the following PR: - https://github.com/apache/datafusion/pull/1

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-07-31 Thread via GitHub
BlakeOrth commented on code in PR #16971: URL: https://github.com/apache/datafusion/pull/16971#discussion_r2246474646 ## datafusion/execution/src/cache/cache_unit.rs: ## @@ -157,9 +158,79 @@ impl CacheAccessor>> for DefaultListFilesCache { } } +/// Collected file embedd

[I] [Parquet Metadata Cache] Use the cached metadata for ListingTable statistics [datafusion]

2025-07-31 Thread via GitHub
alamb opened a new issue, #17002: URL: https://github.com/apache/datafusion/issues/17002 ### Is your feature request related to a problem or challenge? @nuno-faria implemented the core Parquet Metadata caching logic in the following PR: - https://github.com/apache/datafusion/pull/1

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3141496258 I also filed a ticket to track making `count(*)` queries faster: https://github.com/apache/datafusion/issues/17001 -- This is an automated message from the Apache Git Service

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-07-31 Thread via GitHub
alamb commented on code in PR #16971: URL: https://github.com/apache/datafusion/pull/16971#discussion_r2246489431 ## datafusion/common/src/config.rs: ## @@ -549,6 +549,12 @@ config_namespace! { /// (reading) Use any available bloom filters when reading parquet files

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3141498147 I have gathered follow on tasks in an epic: - https://github.com/apache/datafusion/issues/17000 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-07-31 Thread via GitHub
tobixdev commented on PR #16977: URL: https://github.com/apache/datafusion/pull/16977#issuecomment-3142929279 Thanks for reproducing! I've now removed the changes to `ahash`. Could you re-run the benchmarks? -- This is an automated message from the Apache Git Service. To respond

[I] Shared `DynamicFilterPhysicalExpr` causes recursive queries to fail [datafusion]

2025-07-31 Thread via GitHub
nuno-faria opened a new issue, #16998: URL: https://github.com/apache/datafusion/issues/16998 ### Describe the bug The `SortExec` operator will share the same `DynamicFilterPhysicalExpr` across multiple invocations of `with_new_children` (e.g., from `reset_plan_states`), causing quer

Re: [PR] fix: split `expr.proto` file [datafusion-comet]

2025-07-31 Thread via GitHub
comphead commented on PR #2046: URL: https://github.com/apache/datafusion-comet/pull/2046#issuecomment-3141237400 related to https://github.com/apache/datafusion-comet/pull/1978#discussion_r2180394231 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-31 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3141237953 Guess it just doesn't make a difference then. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] fix: zero Arrow Array offset before sending across FFI [datafusion-comet]

2025-07-31 Thread via GitHub
mbutrovich merged PR #2052: URL: https://github.com/apache/datafusion-comet/pull/2052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Tracking fs-hdfs issues [datafusion-comet]

2025-07-31 Thread via GitHub
parthchandra commented on issue #2034: URL: https://github.com/apache/datafusion-comet/issues/2034#issuecomment-3141766740 No I do not. Thinking of alternative strategies (like cloning the code in Comet). -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] feat: add macros for DataFusionError variants [datafusion]

2025-07-31 Thread via GitHub
github-actions[bot] closed pull request #15946: feat: add macros for DataFusionError variants URL: https://github.com/apache/datafusion/pull/15946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-07-31 Thread via GitHub
github-actions[bot] closed pull request #15022: feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl URL: https://github.com/apache/datafusion/pull/15022 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] [POC] feat: Add datafusion-storage [datafusion]

2025-07-31 Thread via GitHub
github-actions[bot] closed pull request #15018: [POC] feat: Add datafusion-storage URL: https://github.com/apache/datafusion/pull/15018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Fix Correlated Subquery With Depth Larger Than One [datafusion]

2025-07-31 Thread via GitHub
github-actions[bot] closed pull request #16060: Fix Correlated Subquery With Depth Larger Than One URL: https://github.com/apache/datafusion/pull/16060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-07-31 Thread via GitHub
adamreeve commented on PR #16779: URL: https://github.com/apache/datafusion/pull/16779#issuecomment-3141937961 Hi @alamb, could you please take a look at this when you have time, or suggest someone else who could review this? -- This is an automated message from the Apache Git Service. To

Re: [PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-07-31 Thread via GitHub
ding-young commented on PR #16996: URL: https://github.com/apache/datafusion/pull/16996#issuecomment-3141976153 ### Memory Usage Benchmark current ``` Query Time (ms) Peak RSS Peak Commit Major Page Faults ---

Re: [PR] feat: change Expr OuterReferenceColumn and Alias to Box type for reducing expr struct size [datafusion]

2025-07-31 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3142010548 > 🤖: Benchmark completed > > Details > > ``` > group main reduce_expr_size > -

Re: [PR] chore: Add scripts for running benchmarks with Blaze [datafusion-comet]

2025-07-31 Thread via GitHub
codecov-commenter commented on PR #2050: URL: https://github.com/apache/datafusion-comet/pull/2050#issuecomment-3140671595 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2050?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: zero Arrow Array offset before sending across FFI [datafusion-comet]

2025-07-31 Thread via GitHub
codecov-commenter commented on PR #2052: URL: https://github.com/apache/datafusion-comet/pull/2052#issuecomment-3140810198 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2052?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: zero Arrow Array offset before sending across FFI [datafusion-comet]

2025-07-31 Thread via GitHub
mbutrovich commented on code in PR #2052: URL: https://github.com/apache/datafusion-comet/pull/2052#discussion_r2246080558 ## native/core/src/execution/jni_api.rs: ## @@ -341,10 +341,29 @@ fn prepare_output( let mut i = 0; while i < results.len() {

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-31 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2246082774 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -943,10 +978,71 @@ impl ExecutionPlan for HashJoinExec { try_embed_projection(projection, s

Re: [I] Implement Spark-compatible cast from integral types to decimal [datafusion-comet]

2025-07-31 Thread via GitHub
coderfender commented on issue #2049: URL: https://github.com/apache/datafusion-comet/issues/2049#issuecomment-3141268402 Thank you @andygrove . This issue seems to be largely related to precision and scale. In the PR: #1996 we were able to get cast operation working by setting precision

Re: [I] `CooperativeExec` incorrectly implements `maintains_input_order` [datafusion]

2025-07-31 Thread via GitHub
alamb commented on issue #16994: URL: https://github.com/apache/datafusion/issues/16994#issuecomment-3141279920 > What might be useful is to have more general test logic that takes in an ExecutionPlan and verifies (to the extent possible) that the post conditions required by the trait are m

Re: [I] `CooperativeExec` incorrectly implements `maintains_input_order` [datafusion]

2025-07-31 Thread via GitHub
alamb commented on issue #16994: URL: https://github.com/apache/datafusion/issues/16994#issuecomment-3141284814 Or we can extend here https://github.com/apache/datafusion/blob/c6d55207161e400e53645d5ee7d7bf16cd024c2f/datafusion/core/src/physical_planner.rs#L2356-L2355 with some basic checks

Re: [PR] feat: change Expr OuterReferenceColumn and Alias to Box type for reducing expr struct size [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3141288216 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

Re: [PR] Added Example for `Statistical Functions` in Docs [datafusion]

2025-07-31 Thread via GitHub
alamb merged PR #16927: URL: https://github.com/apache/datafusion/pull/16927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] [Doc Fix] : Missing Example for `Statistical Functions` in the Docs [datafusion]

2025-07-31 Thread via GitHub
alamb closed issue #16923: [Doc Fix] : Missing Example for `Statistical Functions` in the Docs URL: https://github.com/apache/datafusion/issues/16923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Added Example for `Statistical Functions` in Docs [datafusion]

2025-07-31 Thread via GitHub
alamb commented on PR #16927: URL: https://github.com/apache/datafusion/pull/16927#issuecomment-3141286583 I think this one is good to go -- thanks @Adez017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

  1   2   >