Re: How to separate a subset of an RDD by day?

Soumya Simanta Fri, 11 Jul 2014 14:25:30 -0700

>
> Solution 2 is to map the objects into a pair RDD where the
> key is the number of the day in the interval, then group by
> key, collect, and parallelize the resulting grouped data.
> However, I worry collecting large data sets is going to be
> a serious performance bottleneck.
>
>
Why do you have to do a "collect" ?  You can do a groupBy and then write
the grouped data to disk again

Re: How to separate a subset of an RDD by day?

Reply via email to