Re: Hive Hash in Spark

2019-05-07 Thread Bruce Robbins
u, Mar 7, 2019 at 12:58 PM wrote: > Thanks Ryan and Reynold for the information! > > > > Cheers, > > Tyson > > > > *From:* Ryan Blue > *Sent:* Wednesday, March 6, 2019 3:47 PM > *To:* Reynold Xin > *Cc:* tcon...@gmail.com; Spark Dev List > *Subject:

RE: Hive Hash in Spark

2019-03-07 Thread tcondie
Thanks Ryan and Reynold for the information! Cheers, Tyson From: Ryan Blue Sent: Wednesday, March 6, 2019 3:47 PM To: Reynold Xin Cc: tcon...@gmail.com; Spark Dev List Subject: Re: Hive Hash in Spark I think this was needed to add support for bucketed Hive tables. Like Tyson

Re: Hive Hash in Spark

2019-03-06 Thread Ryan Blue
I think this was needed to add support for bucketed Hive tables. Like Tyson noted, if the other side of a join can be bucketed the same way, then Spark can use a bucketed join. I have long-term plans to support this in the DataSourceV2 API, but I don't think we are very close to implementing it yet

Re: Hive Hash in Spark

2019-03-06 Thread Reynold Xin
I think they might be used in bucketing? Not 100% sure. On Wed, Mar 06, 2019 at 1:40 PM, < tcon...@gmail.com > wrote: > > > > Hi, > > > >   > > > > I noticed the existence of a Hive Hash partitioning implementation in > Spark, but also noticed that it’s not being used, and that the Spark