Yeah it is actually serializing elements after chunking them into arrays of
10 elements each. It's not actually a key-value pair in the SequenceFile
for each element. That is how objectFile() reads it and flatMaps it, and
the docs say that the intent is that this is an opaque, not-guaranteed
'format'. That is it seems like the contract is just that objectFile()
reads what saveAsObjectFile() writes so I think that's why it's not exactly
advertised what it is under the hood. I think that if you want explicit
control over how to serialize elements, just use saveAsSequenceFile on
whatever serialization you want.

On Tue, Jan 13, 2015 at 10:43 PM, Kevin Burton <bur...@spinn3r.com> wrote:

> Yes.. but this isn’t what the main documentation says.
>
> The file format isn’t very discoverable..
>
> Also, the documentation doesn’t say anything about the group by 10..
> what’s that about?
>
> Kevin
>
> On Tue, Jan 13, 2015 at 2:28 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> Yes, that's even what the objectFile javadoc says. It is expecting a
>> SequenceFile with NullWritable keys and BytesWritable values containing the
>> serialized values. This looks correct to me.
>>
>> On Tue, Jan 13, 2015 at 8:39 AM, Kevin Burton <bur...@spinn3r.com> wrote:
>>
>>> This is interesting.
>>>
>>> I’m using ObjectInputStream to try to read a file written as
>>> saveAsObjectFile… but it’s not working.
>>>
>>> The documentation says:
>>>
>>> "Write the elements of the dataset in a simple format using Java
>>> serialization, which can then be loaded using SparkContext.objectFile().
>>> ”
>>>
>>> … but that’s not right.
>>>
>>>   def saveAsObjectFile(path: String) {
>>>     this.mapPartitions(iter => iter.grouped(10).map(_.toArray))
>>>       .map(x => (NullWritable.get(), new
>>> BytesWritable(Utils.serialize(x))))
>>>       .saveAsSequenceFile(path)
>>>   }
>>>
>>> .. am I correct to assume that each entry is a serialized object BUT
>>> that the entire thing is wrapped as a sequence file?
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>

Reply via email to