SparkDataFrame.repartition() uses hash partitioning, it can guarantee that all
rows of the same column value go to the same partition, but it does not
guarantee that each partition contain only single column value.
Fortunately, Spark 2.0 comes with gapply() in SparkR. You can apply an R
functio
Hi,
This is a question regarding SparkR in spark 2.0.
Given that I have a SparkDataFrame and I want to partition it using one
column's values. Each value corresponds to a partition, all rows that
having the same column value shall go to the same partition, no more no
less.
Seems the function