Maybe implement a very simple function that uses the Hadoop API to read in based on file names (i.e. parts)?
On Mon, Mar 23, 2015 at 10:55 AM, Koert Kuipers <[email protected]> wrote: > there is a way to reinstate the partitioner, but that requires > sc.objectFile to read exactly what i wrote, which means sc.objectFile > should never split files on reading (a feature of hadoop file inputformat > that gets in the way here). > > On Mon, Mar 23, 2015 at 1:39 PM, Koert Kuipers <[email protected]> wrote: > >> i just realized the major limitation is that i lose partitioning info... >> >> On Mon, Mar 23, 2015 at 1:34 AM, Reynold Xin <[email protected]> wrote: >> >>> >>> On Sun, Mar 22, 2015 at 6:03 PM, Koert Kuipers <[email protected]> >>> wrote: >>> >>>> so finally i can resort to: >>>> rdd.saveAsObjectFile(...) >>>> sc.objectFile(...) >>>> but that seems like a rather broken abstraction. >>>> >>>> >>> This seems like a fine solution to me. >>> >>> >> >
