You can read same partition from every hour's output, union these RDDs and then
repartition them as a single partition. This will be done for all partitions
one by one. It may not necessarily improve the performance, will depend on size
of spills in job when all the data was processed together.
This isn't currently a capability that Spark has, though it has definitely
been discussed: https://issues.apache.org/jira/browse/SPARK-1061. The
primary obstacle at this point is that Hadoop's FileInputFormat doesn't
guarantee that each file corresponds to a single split, so the records
correspond