Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-18 Thread via GitHub
mbutrovich commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2985161552 @Kontinuation thank you for bringing this up! Let me investigate. In the meantime I suspect we'll revert this. -- This is an automated message from the Apache Git Service.

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-18 Thread via GitHub
Kontinuation commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2985113515 This implementation of RangePartitioning may be incorrect. RangePartitioning should partition the input DataFrame into partitions with consecutive and non-overlapping range

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-17 Thread via GitHub
mbutrovich merged PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-14 Thread via GitHub
mbutrovich commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146947020 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -307,6 +307,18 @@ object CometConf extends ShimCometConf { .booleanConf .crea

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-13 Thread via GitHub
parthchandra commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146243533 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -307,6 +307,18 @@ object CometConf extends ShimCometConf { .booleanConf .cr

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-13 Thread via GitHub
mbutrovich commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146241932 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -307,6 +307,18 @@ object CometConf extends ShimCometConf { .booleanConf .crea

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-13 Thread via GitHub
parthchandra commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146236444 ## spark/src/test/scala/org/apache/comet/exec/CometNativeShuffleSuite.scala: ## @@ -120,29 +120,51 @@ class CometNativeShuffleSuite extends CometTestBase

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-13 Thread via GitHub
andygrove commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146006687 ## dev/diffs/3.4.3.diff: ## @@ -2404,7 +2411,31 @@ index 266bb343526..c3e3d155813 100644 checkAnswer(aggDF, df1.groupBy("j").agg(max("k"))) }

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-13 Thread via GitHub
mbutrovich commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2970529921 Looking at the 3 Spark SQL test failures (all related to bucket scan) now that there are fewer 3.5.x diffs to update. -- This is an automated message from the Apache Git Se

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-11 Thread via GitHub
mbutrovich commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2964065141 Last thing I am waiting on is to do a new set of Spark diffs to turn off native RangePartitioning in the 3 bucketing-related tests. Because of the different random number gen

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-11 Thread via GitHub
mbutrovich commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2139997704 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2904,6 +2903,8 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-10 Thread via GitHub
andygrove commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2138821947 ## native/core/src/execution/shuffle/range_partitioner.rs: ## @@ -0,0 +1,432 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-10 Thread via GitHub
andygrove commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2138822721 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2904,6 +2903,8 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-10 Thread via GitHub
andygrove commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2138821450 ## native/core/src/execution/shuffle/range_partitioner.rs: ## @@ -0,0 +1,432 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-10 Thread via GitHub
mbutrovich commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2138343547 ## native/core/benches/shuffle_writer.rs: ## @@ -66,10 +67,40 @@ fn criterion_benchmark(c: &mut Criterion) { CompressionCodec::Zstd(6), ] {

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-09 Thread via GitHub
andygrove commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2957023096 I ran fresh benchmarks, but I do not see any change in performance. Perhaps the range partitioning shuffles are not a significant cost in these benchmarks. -- This is an aut

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-09 Thread via GitHub
mbutrovich commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2135992681 ## native/core/benches/shuffle_writer.rs: ## @@ -42,20 +45,18 @@ fn criterion_benchmark(c: &mut Criterion) { CompressionCodec::Zstd(1), Co

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-08 Thread via GitHub
andygrove commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2954116663 It would be interesting to use our new tracing feature to compare on-heap vs off-heap memory usage with range partitioning supported natively versus falling back to Spark. -

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-07 Thread via GitHub
mbutrovich commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952887778 > I ran TPC-H benchmarks and saw shuffles with range partitioning run natively. I did not see any difference in performance compared to the last set of benchmarks I ran some

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-07 Thread via GitHub
andygrove commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952814148 I ran TPC-H benchmarks and saw shuffles with range partitioning run natively. I did not see any difference in performance compared to the last set of benchmarks I ran some tim

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-07 Thread via GitHub
codecov-commenter commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952754584 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1862?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca