mbutrovich commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2985161552
@Kontinuation thank you for bringing this up! Let me investigate. In the
meantime I suspect we'll revert this.
--
This is an automated message from the Apache Git Service.
Kontinuation commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2985113515
This implementation of RangePartitioning may be incorrect. RangePartitioning
should partition the input DataFrame into partitions with consecutive and
non-overlapping range
mbutrovich merged PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
mbutrovich commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146947020
##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -307,6 +307,18 @@ object CometConf extends ShimCometConf {
.booleanConf
.crea
parthchandra commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146243533
##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -307,6 +307,18 @@ object CometConf extends ShimCometConf {
.booleanConf
.cr
mbutrovich commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146241932
##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -307,6 +307,18 @@ object CometConf extends ShimCometConf {
.booleanConf
.crea
parthchandra commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146236444
##
spark/src/test/scala/org/apache/comet/exec/CometNativeShuffleSuite.scala:
##
@@ -120,29 +120,51 @@ class CometNativeShuffleSuite extends CometTestBase
andygrove commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146006687
##
dev/diffs/3.4.3.diff:
##
@@ -2404,7 +2411,31 @@ index 266bb343526..c3e3d155813 100644
checkAnswer(aggDF, df1.groupBy("j").agg(max("k")))
}
mbutrovich commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2970529921
Looking at the 3 Spark SQL test failures (all related to bucket scan) now
that there are fewer 3.5.x diffs to update.
--
This is an automated message from the Apache Git Se
mbutrovich commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2964065141
Last thing I am waiting on is to do a new set of Spark diffs to turn off
native RangePartitioning in the 3 bucketing-related tests. Because of the
different random number gen
mbutrovich commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2139997704
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -2904,6 +2903,8 @@ object QueryPlanSerde extends Logging with CometExprShim {
andygrove commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2138821947
##
native/core/src/execution/shuffle/range_partitioner.rs:
##
@@ -0,0 +1,432 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more co
andygrove commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2138822721
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -2904,6 +2903,8 @@ object QueryPlanSerde extends Logging with CometExprShim {
andygrove commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2138821450
##
native/core/src/execution/shuffle/range_partitioner.rs:
##
@@ -0,0 +1,432 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more co
mbutrovich commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2138343547
##
native/core/benches/shuffle_writer.rs:
##
@@ -66,10 +67,40 @@ fn criterion_benchmark(c: &mut Criterion) {
CompressionCodec::Zstd(6),
] {
andygrove commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2957023096
I ran fresh benchmarks, but I do not see any change in performance. Perhaps
the range partitioning shuffles are not a significant cost in these benchmarks.
--
This is an aut
mbutrovich commented on code in PR #1862:
URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2135992681
##
native/core/benches/shuffle_writer.rs:
##
@@ -42,20 +45,18 @@ fn criterion_benchmark(c: &mut Criterion) {
CompressionCodec::Zstd(1),
Co
andygrove commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2954116663
It would be interesting to use our new tracing feature to compare on-heap vs
off-heap memory usage with range partitioning supported natively versus falling
back to Spark.
-
mbutrovich commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952887778
> I ran TPC-H benchmarks and saw shuffles with range partitioning run
natively. I did not see any difference in performance compared to the last set
of benchmarks I ran some
andygrove commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952814148
I ran TPC-H benchmarks and saw shuffles with range partitioning run
natively. I did not see any difference in performance compared to the last set
of benchmarks I ran some tim
codecov-commenter commented on PR #1862:
URL:
https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2952754584
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1862?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
21 matches
Mail list logo