I am able to get around the problem by doing a map and getting the Event out of the EventWritable before I do my collect. I think I'll do that for now.
On Tue, Feb 10, 2015 at 6:04 PM, Corey Nolet <[email protected]> wrote: > I am using an input format to load data from Accumulo [1] in to a Spark > RDD. It looks like something is happening in the serialization of my output > writable between the time it is emitted from the InputFormat and the time > it reaches its destination on the driver. > > What's happening is that the resulting Event object [2] inside the > EventWritable [3] appears to have lost its Tuples [4] > > > [1] > https://github.com/calrissian/accumulo-recipes/blob/master/store/event-store/src/main/java/org/calrissian/accumulorecipes/eventstore/hadoop/EventInputFormat.java > [2] > https://github.com/calrissian/mango/blob/master/mango-core/src/main/java/org/calrissian/mango/domain/event/Event.java > [3] > https://github.com/calrissian/accumulo-recipes/blob/master/commons/src/main/java/org/calrissian/accumulorecipes/commons/hadoop/EventWritable.java > [4] > https://github.com/calrissian/mango/blob/master/mango-core/src/main/java/org/calrissian/mango/domain/Tuple.java > > I'm at a loss. I've tested using the SerializableWritable and serializing > an EventWritable to an ObjectOutputStream in a unit test and it serialized > fine without loss of data. I also verified that the Event object itself > serializes and deserializes fine with an ObjectOutputStream. I'm trying to > follow breakpoints through the code to figure out where exactly this may be > happening but the objects all seem to be bytes already when passed into the > JavaSerializerInstance (if I'm properly following what's going on, that > is). > > Any ideas on what this may be? I'm using Spark 1.2.0 and Scala 2.10 but > the business objects I'm using are from Java 1.7. > > >
