Hi all, I'm writing an spark streaming app which consumes Kafka stream. Objects consumed are Avro objects (they do not implement java serialization). I decided to use Kryo and have set "spark.serializer" and "spark.closure.serializer". But now I am getting exception in CheckpointWriter: 14/05/19 15:08:18 ERROR actor.OneForOneStrategy: Vsw.AvroDto.AppLogs.LoggingEvent java.io.NotSerializableException: Vsw.AvroDto.AppLogs.LoggingEvent at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541) ... at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:182)
I took a look at the code and I see that CheckpointWriter.write uses java.io.ObjectOutputStream. It seems strange to me, because what the point of being able to set "spark.serializer" to something else than java, if checkpoint code still requires it to be java. Is it an ommision or are there some reasons for CheckpointWriter to use java serialization? Would it be more logical for CheckpointWriter to do the same as spark itself: request serialization class form spark configuration with fallback to java? Vadim. -- >From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is explicitly specified