Hi Mohan, It’s been a while since I’ve looked at this specifically, but I don’t think the default Kryo serializer will properly serialize Avro. IIRC, there are complications around the way that Avro handles nullable fields, which would be consistent with the NPE you’re encountering here. That’s why we’ve been using a custom serializer for Avro objects. That being said, we started doing this back on Spark 0.6.1 and have just stuck with the same strategy the whole time; it’s possible that things have changed in the interim.
Regards, Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 On Sep 18, 2014, at 8:20 AM, mohan.gadm <mohan.g...@gmail.com> wrote: > Hi frank, thanks for the info, thats great. but im not saying Avro serializer > is failing. Kryo is failing > but > im using kryo serializer. and registering Avro generated classes with kryo. > sparkConf.set("spark.serializer", > "org.apache.spark.serializer.KryoSerializer"); > sparkConf.set("spark.kryo.registrator", > "com.globallogic.goliath.platform.PlatformKryoRegistrator"); > > But how did it able to perform output operation when the message is simple. > but not when the message is complex.(please observe no avro schema changes) > just the data is changed. > providing you more info below. > > avro schema: > ============= > record KeyValueObject { > union{boolean, int, long, float, double, bytes, string} name; > union {boolean, int, long, float, double, bytes, string, > array<union{boolean, int, long, float, double, bytes, string, > KeyValueObject}>, KeyValueObject} value; > } > record Datum { > union {boolean, int, long, float, double, bytes, string, > array<union{boolean, int, long, float, double, bytes, string, > KeyValueObject}>, KeyValueObject} value; > } > record ResourceMessage { > string version; > string sequence; > string resourceGUID; > string GWID; > string GWTimestamp; > union {Datum, array<Datum>} data; > } > > simple message is as below: > =================== > {"version": "01", "sequence": "00001", "resourceGUID": "001", "GWID": "002", > "GWTimestamp": "1409823150737", "data": {"value": "30"}} > > complex message is as below: > =================== > {"version": "01", "sequence": "00001", "resource": "sensor-001", > "controller": "002", "controllerTimestamp": "1411038710358", "data": > {"value": [{"name": "Temperature", "value": "30"}, {"name": "Speed", > "value": "60"}, {"name": "Location", "value": ["+401213.1", "-0750015.1"]}, > {"name": "Timestamp", "value": "2014-09-09T08:15:25-05:00"}]}} > > > both messages can fit in to the schema, > > actually the message is coming from kafka, which is avro binary. > at spark converting the message to Avro objects(ResourceMessage) using > decoders.(this is also working). > able to perform some mappings, able to convert the stream<ResourceMessage> > to stream<flume Events> > > now the events need to be pushed to flume source. for this i need to collect > the RDD, and then send to flume client. > > end to end worked fine with simple message. problem is with complex message. > > > > > ----- > Thanks & Regards, > Mohan > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14565.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org