Fwd: [Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

2016-12-01 Thread w.zhaokang
Hi all, In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid shuffle thus bringing us high join performance. In the new Dataset API in Spark 2.0, is the high performance shuffle-free join by co-partition mechanism still feasible? I have looked through the API doc but failed

[Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

2016-12-01 Thread Dale Wang
Hi all, In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid shuffle thus bringing us high join performance. In the new Dataset API in Spark 2.0, is the high performance shuffle-free join by co-partition mechanism still feasible? I have looked through the API doc but failed