Hi all,
In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid
shuffle thus bringing us high join performance.
In the new Dataset API in Spark 2.0, is the high performance shuffle-free
join by co-partition mechanism still feasible? I have looked through the
API doc but failed
Hi all,
In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid
shuffle thus bringing us high join performance.
In the new Dataset API in Spark 2.0, is the high performance shuffle-free
join by co-partition mechanism still feasible? I have looked through the
API doc but failed