Re: spark disk-to-disk

Reynold Xin Mon, 23 Mar 2015 13:32:26 -0700

Maybe implement a very simple function that uses the Hadoop API to read in
based on file names (i.e. parts)?


On Mon, Mar 23, 2015 at 10:55 AM, Koert Kuipers <[email protected]> wrote:

> there is a way to reinstate the partitioner, but that requires
> sc.objectFile to read exactly what i wrote, which means sc.objectFile
> should never split files on reading (a feature of hadoop file inputformat
> that gets in the way here).
>
> On Mon, Mar 23, 2015 at 1:39 PM, Koert Kuipers <[email protected]> wrote:
>
>> i just realized the major limitation is that i lose partitioning info...
>>
>> On Mon, Mar 23, 2015 at 1:34 AM, Reynold Xin <[email protected]> wrote:
>>
>>>
>>> On Sun, Mar 22, 2015 at 6:03 PM, Koert Kuipers <[email protected]>
>>> wrote:
>>>
>>>> so finally i can resort to:
>>>> rdd.saveAsObjectFile(...)
>>>> sc.objectFile(...)
>>>> but that seems like a rather broken abstraction.
>>>>
>>>>
>>> This seems like a fine solution to me.
>>>
>>>
>>
>

Re: spark disk-to-disk

Reply via email to