subject:"How to partition a SparkDataFrame using all distinct column values in sparkR"

Re: How to partition a SparkDataFrame using all distinct column values in sparkR

2016-08-03 Thread Sun Rui

SparkDataFrame.repartition() uses hash partitioning, it can guarantee that all rows of the same column value go to the same partition, but it does not guarantee that each partition contain only single column value. Fortunately, Spark 2.0 comes with gapply() in SparkR. You can apply an R functio

How to partition a SparkDataFrame using all distinct column values in sparkR

2016-07-25 Thread Neil Chang

Hi, This is a question regarding SparkR in spark 2.0. Given that I have a SparkDataFrame and I want to partition it using one column's values. Each value corresponds to a partition, all rows that having the same column value shall go to the same partition, no more no less. Seems the function