I have a rather simple avro schema to serialize Tweets (message, username,
timestamp).
Kryo and twitter chill are used to do so.
For my dev environment the Spark context is configured as below
val conf: SparkConf = new SparkConf()
conf.setAppName("kryo_test")
conf.setMaster(“local[4]")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.kryo.registrator", "co.feeb.TweetRegistrator”)
Serialization is setup with
override def registerClasses(kryo: Kryo): Unit = {
kryo.register(classOf[Tweet],
AvroSerializer.SpecificRecordBinarySerializer[Tweet])
}
(This method gets called)
Using this configuration to persist some object fails with
java.io.NotSerializableException: co.feeb.avro.Tweet
(which seems to be ok as this class is not Serializable)
I used the following code:
val ctx: SparkContext = new SparkContext(conf)
val tweets: RDD[Tweet] = ctx.parallelize(List(
new Tweet("a", "b", 1L),
new Tweet("c", "d", 2L),
new Tweet("e", "f", 3L)
)
)
tweets.saveAsObjectFile("file:///tmp/spark”)
Using saveAsTextFile works, but persisted files are not binary but JSON
cat /tmp/spark/part-00000
{"username": "a", "text": "b", "timestamp": 1}
{"username": "c", "text": "d", "timestamp": 2}
{"username": "e", "text": "f", "timestamp": 3}
Is this intended behaviour, a configuration issue, avro serialisation not
working in local mode or something else?
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]