subject:"Re\: Difference between partition and groupBy"

Re: Difference between partition and groupBy

2017-02-24 Thread Patrick Brunmayr

Thank you for that answer. Helped me a lot 2017-02-23 22:10 GMT+01:00 Fabian Hueske : > Hi Patrick, > > as Robert said, partitionBy() shuffles the data such that all records with > the same key end up in the same partition. That's all it does. > groupBy() also prepares the data in each partition

Re: Difference between partition and groupBy

2017-02-23 Thread Fabian Hueske

Hi Patrick, as Robert said, partitionBy() shuffles the data such that all records with the same key end up in the same partition. That's all it does. groupBy() also prepares the data in each partition to be processed per key. For example, if you run a groupReduce after a groupBy(), the data is fir

Re: Difference between partition and groupBy

2017-02-23 Thread Robert Metzger

Hi Patrick, I think (but I'm not 100% sure) its not a difference in what the engine does in the end, its more of an API thing. When you are grouping, you can perform operations such as reducing afterwards. On a partitioned dataset, you can do stuff like processing each partition in parallel, or so

Re: Difference between partition and groupBy

Re: Difference between partition and groupBy

Re: Difference between partition and groupBy

3 matches

Site Navigation

Mail list logo

Footer information