Re: CombinePerKey and GroupByKey

2019-03-01 Thread Etienne Chauchot
That's good to know Thanks Etienne Le jeudi 28 février 2019 à 10:05 -0800, Reynold Xin a écrit : > This should be fine. Dataset.groupByKey is a logical operation, not a > physical one (as in Spark wouldn’t always > materialize all the groups in memory). > On Thu, Feb 28, 2019 at 1:46 AM Etienne C

Re: CombinePerKey and GroupByKey

2019-02-28 Thread Reynold Xin
This should be fine. Dataset.groupByKey is a logical operation, not a physical one (as in Spark wouldn’t always materialize all the groups in memory). On Thu, Feb 28, 2019 at 1:46 AM Etienne Chauchot wrote: > Hi all, > > I'm migrating RDD pipelines to Dataset and I saw that Combine.PerKey is no

CombinePerKey and GroupByKey

2019-02-28 Thread Etienne Chauchot
Hi all, I'm migrating RDD pipelines to Dataset and I saw that Combine.PerKey is no more there in the Dataset API. So, I translated it to: KeyValueGroupedDataset> groupedDataset = keyedDataset.groupByKey(KVHelpers.extractKey(), EncoderHelpers.genericEncoder()); Dataset> combinedDataset =