Hi all,

I'm writing an spark streaming app which consumes Kafka stream. Objects
consumed are Avro objects (they do not implement java serialization). I
decided to use Kryo and have set "spark.serializer" and
"spark.closure.serializer". But now I am getting exception in
CheckpointWriter:
14/05/19 15:08:18 ERROR actor.OneForOneStrategy:
Vsw.AvroDto.AppLogs.LoggingEvent
java.io.NotSerializableException: Vsw.AvroDto.AppLogs.LoggingEvent
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
    at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
...
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
    at
org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:182)

I took a look at the code and I see that CheckpointWriter.write uses
java.io.ObjectOutputStream. It seems strange to me, because what the point
of being able to set "spark.serializer" to something else than java, if
checkpoint code still requires it to be java. Is it an ommision or are
there some reasons for CheckpointWriter to use java serialization? Would it
be more logical for CheckpointWriter to do the same as spark itself:
request serialization class form spark configuration with fallback to java?

Vadim.
-- 
>From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is
explicitly specified

Reply via email to