I am using an input format to load data from Accumulo [1] in to a Spark
RDD. It looks like something is happening in the serialization of my output
writable between the time it is emitted from the InputFormat and the time
it reaches its destination on the driver.

What's happening is that the resulting Event object [2] inside the
EventWritable [3] appears to have lost its Tuples [4]


[1]
https://github.com/calrissian/accumulo-recipes/blob/master/store/event-store/src/main/java/org/calrissian/accumulorecipes/eventstore/hadoop/EventInputFormat.java
[2]
https://github.com/calrissian/mango/blob/master/mango-core/src/main/java/org/calrissian/mango/domain/event/Event.java
[3]
https://github.com/calrissian/accumulo-recipes/blob/master/commons/src/main/java/org/calrissian/accumulorecipes/commons/hadoop/EventWritable.java
[4]
https://github.com/calrissian/mango/blob/master/mango-core/src/main/java/org/calrissian/mango/domain/Tuple.java

I'm at a loss. I've tested using the SerializableWritable and serializing
an EventWritable to an ObjectOutputStream in a unit test and it serialized
fine without loss of data. I also verified that the Event object itself
serializes and deserializes fine with an ObjectOutputStream. I'm trying to
follow breakpoints through the code to figure out where exactly this may be
happening but the objects all seem to be bytes already when passed into the
JavaSerializerInstance (if I'm properly following what's going on, that
is).

Any ideas on what this may be? I'm using Spark 1.2.0 and Scala 2.10 but the
business objects I'm using are from Java 1.7.

Reply via email to