github-actions[bot] closed pull request #15423: Introduce selection vector
repartitioning
URL: https://github.com/apache/datafusion/pull/15423
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
github-actions[bot] commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-3055027587
Thank you for your contribution. Unfortunately, this pull request is stale
because it has been open 60 days with no activity. Please remove the stale
label or comment or
alamb commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2864278552
I think this is a draft so marking it as such to try and make it clearer
what PRs are waiting on review
--
This is an automated message from the Apache Git Service.
To respond to the
alamb commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2852012982
BTW I have not had a chance to look at this PR yet, but the high level idea
of pushdown / late materialization is also being explored under the name
"dynamic filtering" -- it isn't qui
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2848568844
Thanks @2010YOUY01 for the suggestions.
> 1. Use the term `(selection) bitmap` instead of `selection vector` to
avoid confusion. I believe `selection vector` commonly refers
2010YOUY01 commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2848428399
Thank you. This is a foundation for many late materialization optimizations
👏🏼 I have some high-level suggestions, looking forward to hearing your thoughts.
1. Use the term
goldmedal commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2052394398
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -320,6 +336,68 @@ impl BatchPartitioner {
Ok((partition, batch))
Dandandan commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2051728746
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -320,6 +336,68 @@ impl BatchPartitioner {
Ok((partition, batch))
goldmedal commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2051721176
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -320,6 +336,68 @@ impl BatchPartitioner {
Ok((partition, batch))
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2766646972
I did the benchmarks for HashAggregate
https://github.com/apache/datafusion/issues/15383#issuecomment-2766551662
It seems that HashAggregate is slower in the selection vector mod
alamb commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764511786
FYI @ctsk this seems potentially related to some of the work you are doing
as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764434126
The CI failure isn't related to this PR. It could be fixed by
https://github.com/apache/datafusion/pull/15494
--
This is an automated message from the Apache Git Service.
To resp
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764422078
> I think to support this selection vector, the executors need to be updated
to interpret an additional metadata column. However, since executors are part
of the public interface t
2010YOUY01 commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764361563
> Isn't it always better partitioning on this selection vectors in case of
hash-rep 🤔 What is the reason of keeping the old strategy ?
I think to support this selection vect
Dandandan commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2763850029
> I'm working on HashAggregate
[goldmedal#3](https://github.com/goldmedal/datafusion/pull/3) based on this PR.
I found we shouldn't use only one config,
`prefer_hash_selection_vec
berkaysynnada commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764263842
Isn't it always better partitioning on this selection vectors in case of
hash-rep 🤔 What is the reason of keeping the old strategy ?
--
This is an automated message from the
goldmedal commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2019935247
##
datafusion/sqllogictest/test_files/join.slt.part:
##
@@ -1389,6 +1389,112 @@ physical_plan
14)--FilterExec: y@1 = x@0
15)---
zebsme commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2019843158
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -316,6 +334,70 @@ impl BatchPartitioner {
Ok((partition, batch))
zebsme commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2019843158
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -316,6 +334,70 @@ impl BatchPartitioner {
Ok((partition, batch))
zebsme commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2019843158
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -316,6 +334,70 @@ impl BatchPartitioner {
Ok((partition, batch))
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2763356410
I'm working on HashAggregate https://github.com/goldmedal/datafusion/pull/3
based on this PR.
I found we shouldn't use only one config,
`prefer_hash_selection_vector_partitionin
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2761416899
> You should be able to get the test back by also setting
`datafusion.optimizer.hash_join_single_partition_threshold` to `0` / a low
value.
Thanks. It works. I also added th
Dandandan commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2760774554
> #15339 It looks like the join plan is being changed.
You should be able to get the test back by also setting
`hash_join_single_partition_threshold` to `0` / a low value.
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2758966240
also c.c. @zebsme
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comm
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2758775798
https://github.com/apache/datafusion/pull/15339 It looks like the join plan
is being changed.
--
This is an automated message from the Apache Git Service.
To respond to the messa
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2758703787
After rebasing to the main, the hash join no longer uses `RepartitionExec`.
Remove the test by
[3a151dd](https://github.com/apache/datafusion/pull/15423/commits/3a151dd042da1c114f0
goldmedal commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2014723481
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -316,6 +326,71 @@ impl BatchPartitioner {
Ok((partition, batch))
Dandandan commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2753725939
This starts to look nice @goldmedal
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
Dandandan commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2013706099
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -316,6 +326,71 @@ impl BatchPartitioner {
Ok((partition, batch))
Dandandan commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2013701354
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -316,6 +326,71 @@ impl BatchPartitioner {
Ok((partition, batch))
Dandandan commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2013703175
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -316,6 +326,71 @@ impl BatchPartitioner {
Ok((partition, batch))
Dandandan commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2013701354
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -316,6 +326,71 @@ impl BatchPartitioner {
Ok((partition, batch))
32 matches
Mail list logo