Re: Join operator in PySpark
will generate 3 shuffle with python. what is initial design motivation of join operator in PySpark? Any idea to improve join performance in PySpark? Andrew
Join operator in PySpark
) rdd3 = rdd1.join(rdd2).collect() Above code implemented with scala will generate 2 shuffle, but will generate 3 shuffle with python. what is initial design motivation of join operator in PySpark? Any idea to improve join performance in PySpark? Andrew