I am using an input format to load data from Accumulo [1] in to a Spark RDD. It looks like something is happening in the serialization of my output writable between the time it is emitted from the InputFormat and the time it reaches its destination on the driver.
What's happening is that the resulting Event object [2] inside the EventWritable [3] appears to have lost its Tuples [4] [1] https://github.com/calrissian/accumulo-recipes/blob/master/store/event-store/src/main/java/org/calrissian/accumulorecipes/eventstore/hadoop/EventInputFormat.java [2] https://github.com/calrissian/mango/blob/master/mango-core/src/main/java/org/calrissian/mango/domain/event/Event.java [3] https://github.com/calrissian/accumulo-recipes/blob/master/commons/src/main/java/org/calrissian/accumulorecipes/commons/hadoop/EventWritable.java [4] https://github.com/calrissian/mango/blob/master/mango-core/src/main/java/org/calrissian/mango/domain/Tuple.java I'm at a loss. I've tested using the SerializableWritable and serializing an EventWritable to an ObjectOutputStream in a unit test and it serialized fine without loss of data. I also verified that the Event object itself serializes and deserializes fine with an ObjectOutputStream. I'm trying to follow breakpoints through the code to figure out where exactly this may be happening but the objects all seem to be bytes already when passed into the JavaSerializerInstance (if I'm properly following what's going on, that is). Any ideas on what this may be? I'm using Spark 1.2.0 and Scala 2.10 but the business objects I'm using are from Java 1.7.