Kryo serialization is used internally by Spark for spilling or shuffling intermediate results, not for writing out an RDD as an action. Look at Sandy Ryza's examples for some hints on how to do this: https://github.com/sryza/simplesparkavroapp
Regards, Will On July 3, 2015, at 2:45 AM, Dominik Hübner <cont...@dhuebner.com> wrote: I have a rather simple avro schema to serialize Tweets (message, username, timestamp). Kryo and twitter chill are used to do so. For my dev environment the Spark context is configured as below val conf: SparkConf = new SparkConf() conf.setAppName("kryo_test") conf.setMaster(“local[4]") conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") conf.set("spark.kryo.registrator", "co.feeb.TweetRegistrator”) Serialization is setup with override def registerClasses(kryo: Kryo): Unit = { kryo.register(classOf[Tweet], AvroSerializer.SpecificRecordBinarySerializer[Tweet]) } (This method gets called) Using this configuration to persist some object fails with java.io.NotSerializableException: co.feeb.avro.Tweet (which seems to be ok as this class is not Serializable) I used the following code: val ctx: SparkContext = new SparkContext(conf) val tweets: RDD[Tweet] = ctx.parallelize(List( new Tweet("a", "b", 1L), new Tweet("c", "d", 2L), new Tweet("e", "f", 3L) ) ) tweets.saveAsObjectFile("file:///tmp/spark”) Using saveAsTextFile works, but persisted files are not binary but JSON cat /tmp/spark/part-00000 {"username": "a", "text": "b", "timestamp": 1} {"username": "c", "text": "d", "timestamp": 2} {"username": "e", "text": "f", "timestamp": 3} Is this intended behaviour, a configuration issue, avro serialisation not working in local mode or something else? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org