Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-07-16 Thread via GitHub
github-actions[bot] closed pull request #15423: Introduce selection vector repartitioning URL: https://github.com/apache/datafusion/pull/15423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-07-09 Thread via GitHub
github-actions[bot] commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-3055027587 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2864278552 I think this is a draft so marking it as such to try and make it clearer what PRs are waiting on review -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2852012982 BTW I have not had a chance to look at this PR yet, but the high level idea of pushdown / late materialization is also being explored under the name "dynamic filtering" -- it isn't qui

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-05-03 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2848568844 Thanks @2010YOUY01 for the suggestions. > 1. Use the term `(selection) bitmap` instead of `selection vector` to avoid confusion. I believe `selection vector` commonly refers

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-05-02 Thread via GitHub
2010YOUY01 commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2848428399 Thank you. This is a foundation for many late materialization optimizations 👏🏼 I have some high-level suggestions, looking forward to hearing your thoughts. 1. Use the term

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-04-21 Thread via GitHub
goldmedal commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2052394398 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -320,6 +336,68 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-04-20 Thread via GitHub
Dandandan commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2051728746 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -320,6 +336,68 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-04-20 Thread via GitHub
goldmedal commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2051721176 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -320,6 +336,68 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-31 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2766646972 I did the benchmarks for HashAggregate https://github.com/apache/datafusion/issues/15383#issuecomment-2766551662 It seems that HashAggregate is slower in the selection vector mod

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764511786 FYI @ctsk this seems potentially related to some of the work you are doing as well -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-30 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764434126 The CI failure isn't related to this PR. It could be fixed by https://github.com/apache/datafusion/pull/15494 -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-30 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764422078 > I think to support this selection vector, the executors need to be updated to interpret an additional metadata column. However, since executors are part of the public interface t

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-29 Thread via GitHub
2010YOUY01 commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764361563 > Isn't it always better partitioning on this selection vectors in case of hash-rep 🤔 What is the reason of keeping the old strategy ? I think to support this selection vect

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-29 Thread via GitHub
Dandandan commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2763850029 > I'm working on HashAggregate [goldmedal#3](https://github.com/goldmedal/datafusion/pull/3) based on this PR. I found we shouldn't use only one config, `prefer_hash_selection_vec

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-29 Thread via GitHub
berkaysynnada commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764263842 Isn't it always better partitioning on this selection vectors in case of hash-rep 🤔 What is the reason of keeping the old strategy ? -- This is an automated message from the

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-29 Thread via GitHub
goldmedal commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2019935247 ## datafusion/sqllogictest/test_files/join.slt.part: ## @@ -1389,6 +1389,112 @@ physical_plan 14)--FilterExec: y@1 = x@0 15)---

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-29 Thread via GitHub
zebsme commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2019843158 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +334,70 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-29 Thread via GitHub
zebsme commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2019843158 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +334,70 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-29 Thread via GitHub
zebsme commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2019843158 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +334,70 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-29 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2763356410 I'm working on HashAggregate https://github.com/goldmedal/datafusion/pull/3 based on this PR. I found we shouldn't use only one config, `prefer_hash_selection_vector_partitionin

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-28 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2761416899 > You should be able to get the test back by also setting `datafusion.optimizer.hash_join_single_partition_threshold` to `0` / a low value. Thanks. It works. I also added th

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-28 Thread via GitHub
Dandandan commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2760774554 > #15339 It looks like the join plan is being changed. You should be able to get the test back by also setting `hash_join_single_partition_threshold` to `0` / a low value.

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-27 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2758966240 also c.c. @zebsme -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-27 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2758775798 https://github.com/apache/datafusion/pull/15339 It looks like the join plan is being changed. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-27 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2758703787 After rebasing to the main, the hash join no longer uses `RepartitionExec`. Remove the test by [3a151dd](https://github.com/apache/datafusion/pull/15423/commits/3a151dd042da1c114f0

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-26 Thread via GitHub
goldmedal commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2014723481 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +326,71 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-26 Thread via GitHub
Dandandan commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2753725939 This starts to look nice @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-26 Thread via GitHub
Dandandan commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2013706099 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +326,71 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-26 Thread via GitHub
Dandandan commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2013701354 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +326,71 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-26 Thread via GitHub
Dandandan commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2013703175 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +326,71 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-26 Thread via GitHub
Dandandan commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2013701354 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +326,71 @@ impl BatchPartitioner { Ok((partition, batch))