Re: Join operator in PySpark

2014-11-13 Thread Josh Rosen
will generate 3 shuffle with python. what is initial design motivation of join operator in PySpark? Any idea to improve join performance in PySpark? Andrew

Join operator in PySpark

2014-11-13 Thread 夏俊鸾
) rdd3 = rdd1.join(rdd2).collect() Above code implemented with scala will generate 2 shuffle, but will generate 3 shuffle with python. what is initial design motivation of join operator in PySpark? Any idea to improve join performance in PySpark? Andrew