Optimize sort merge join

2018-01-27 Thread Antoine Bonnin
Hi all, I'm relatively new to spark and something is bothering me for optimizing sort merge join from parquet. My work consists to get stats on purchases for a retail company. For example, i have to calculate the mean purchase over a period, for a segment of prodcuts and a segment of c

RE: Sort Merge Join

2015-11-02 Thread Cheng, Hao
No as far as I can tell, @Michael @YinHuai @Reynold , any comments on this optimization? From: Jonathan Coveney [mailto:jcove...@gmail.com] Sent: Tuesday, November 3, 2015 4:17 AM To: Alex Nastetsky Cc: Cheng, Hao; user Subject: Re: Sort Merge Join Additionally, I'm curious if there ar

Re: Sort Merge Join

2015-11-02 Thread Jonathan Coveney
gt;> Taking the file system based data source as “UnknownPartitioning”, will >> be a simple and SAFE way for JOIN, as it’s hard to guarantee the records >> from different data sets with the identical join keys will be loaded by the >> same node/task , since lots of factors need

Re: Sort Merge Join

2015-11-02 Thread Alex Nastetsky
join keys will be loaded by the > same node/task , since lots of factors need to be considered, like task > pool size, cluster size, source format, storage, data locality etc.,. > > I’ll agree it’s worth to optimize it for performance concerns, and > actually in Hive, it is calle

RE: Sort Merge Join

2015-11-01 Thread Cheng, Hao
Hao From: Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] Sent: Monday, November 2, 2015 11:29 AM To: user Subject: Sort Merge Join Hi, I'm trying to understand SortMergeJoin (SPARK-2213). 1) Once SortMergeJoin is enabled, will it ever use ShuffledHashJoin? For example, in the code below, t

Sort Merge Join

2015-11-01 Thread Alex Nastetsky
Hi, I'm trying to understand SortMergeJoin (SPARK-2213). 1) Once SortMergeJoin is enabled, will it ever use ShuffledHashJoin? For example, in the code below, the two datasets have different number of partitions, but it still does a SortMerge join after a "hashpartitioning". CODE: val sparkCo