+1 Nicely done! On Tue, Oct 26, 2021 at 8:08 AM Chao Sun <sunc...@apache.org> wrote:
> Oops, sorry. I just fixed the permission setting. > > Thanks everyone for the positive support! > > On Tue, Oct 26, 2021 at 7:30 AM Wenchen Fan <cloud0...@gmail.com> wrote: > >> +1 to this SPIP and nice writeup of the design doc! >> >> Can we open comment permission in the doc so that we can discuss details >> there? >> >> On Tue, Oct 26, 2021 at 8:29 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> Seems making sense to me. >>> >>> Would be great to have some feedback from people such as @Wenchen Fan >>> <wenc...@databricks.com> @Cheng Su <chen...@fb.com> @angers zhu >>> <angers....@gmail.com>. >>> >>> >>> On Tue, 26 Oct 2021 at 17:25, Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> >>>> +1 for this SPIP. >>>> >>>> On Sun, Oct 24, 2021 at 9:59 AM huaxin gao <huaxin.ga...@gmail.com> >>>> wrote: >>>> >>>>> +1. Thanks for lifting the current restrictions on bucket join and >>>>> making this more generalized. >>>>> >>>>> On Sun, Oct 24, 2021 at 9:33 AM Ryan Blue <b...@apache.org> wrote: >>>>> >>>>>> +1 from me as well. Thanks Chao for doing so much to get it to this >>>>>> point! >>>>>> >>>>>> On Sat, Oct 23, 2021 at 11:29 PM DB Tsai <dbt...@dbtsai.com> wrote: >>>>>> >>>>>>> +1 on this SPIP. >>>>>>> >>>>>>> This is a more generalized version of bucketed tables and bucketed >>>>>>> joins which can eliminate very expensive data shuffles when joins, >>>>>>> and >>>>>>> many users in the Apache Spark community have wanted this feature for >>>>>>> a long time! >>>>>>> >>>>>>> Thank you, Ryan and Chao, for working on this, and I look forward to >>>>>>> it as a new feature in Spark 3.3 >>>>>>> >>>>>>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>>>>>> >>>>>>> On Fri, Oct 22, 2021 at 12:18 PM Chao Sun <sunc...@apache.org> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hi, >>>>>>> > >>>>>>> > Ryan and I drafted a design doc to support a new type of join: >>>>>>> storage partitioned join which covers bucket join support for >>>>>>> DataSourceV2 >>>>>>> but is more general. The goal is to let Spark leverage distribution >>>>>>> properties reported by data sources and eliminate shuffle whenever >>>>>>> possible. >>>>>>> > >>>>>>> > Design doc: >>>>>>> https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE >>>>>>> (includes a POC link at the end) >>>>>>> > >>>>>>> > We'd like to start a discussion on the doc and any feedback is >>>>>>> welcome! >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Chao >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> >>>>> -- John Zhuge