I worked, thank you.
On 30.03.2015 11:58, Sean Owen wrote:
The behavior is the same. I am not sure it's a problem as much as
design decision. It does not require everything to stay in memory, but
the values for one key at a time. Have a look at how the preceding
shuffle works.
Consider repartit
The behavior is the same. I am not sure it's a problem as much as
design decision. It does not require everything to stay in memory, but
the values for one key at a time. Have a look at how the preceding
shuffle works.
Consider repartitionAndSortWithinPartition to *partition* by hour and
then sort
we are experiencing some problems with the groupBy operations when used
to group together data that will be written in the same file. The
operation that we want to do is the following: given some data with a
timestamp, we want to sort it by timestamp, group it by hour and write
one file per hou