I use Spark's SerializableWritable to wrap CombineFileSplit so I can pass
around the splits. But I ran into Serialization issues. In researching why
my code fails, I found that this might be a bug in CombineFileSplit:

CombineFileSplit doesn't serialize locations in write(DataOutput out) and
deserialize locations in readFields(DataInput in).

When I create a split in CombineFileInputFormat, locations is an array of
String[0], but after deserialization (default contructor, then readFields),
the locations will be null.

This will lead NPE.

Reply via email to