Thanks for looking into this. Does this mean string fields should alway be nullable?
You are right that the result is not yet correct and further digging is needed :( On Wed, Feb 27, 2019 at 1:19 AM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote: > Hi, > > I was dealing with avro stuff lately and most of the time it has something > to do with the schema. > One thing I've pinpointed quickly (where I was struggling also) is the > name field should be nullable but the result is not yet correct so further > digging needed... > > scala> val expectedSchema = StructType(Seq(StructField("name", > StringType,true),StructField("age", IntegerType, false))) > expectedSchema: org.apache.spark.sql.types.StructType = > StructType(StructField(name,StringType,true), > StructField(age,IntegerType,false)) > > scala> val avroTypeStruct = > SchemaConverters.toAvroType(expectedSchema).toString > avroTypeStruct: String = > {"type":"record","name":"topLevelRecord","fields":[{"name":"name","type":["string","null"]},{"name":"age","type":"int"}]} > > scala> dfKV.select(from_avro('value, avroTypeStruct)).show > +---------------------------------------------+ > |from_avro(value, struct<name:string,age:int>)| > +---------------------------------------------+ > | [Mary Jane, 25]| > | [Mary Jane, 25]| > +---------------------------------------------+ > > BR, > G > > > On Wed, Feb 27, 2019 at 7:43 AM Hien Luu <hien...@gmail.com> wrote: > >> Hi, >> >> I ran into a pretty weird issue with to_avro and from_avro where it was >> not >> able to parse the data in a struct correctly. Please see the simple and >> self contained example below. I am using Spark 2.4. I am not sure if I >> missed something. >> >> This is how I start the spark-shell on my Mac: >> >> ./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0 >> >> import org.apache.spark.sql.types._ >> import org.apache.spark.sql.avro._ >> import org.apache.spark.sql.functions._ >> >> >> spark.version >> >> val df = Seq((1, "John Doe", 30), (2, "Mary Jane", 25)).toDF("id", >> "name", >> "age") >> >> val dfStruct = df.withColumn("value", struct("name","age")) >> >> dfStruct.show >> dfStruct.printSchema >> >> val dfKV = dfStruct.select(to_avro('id).as("key"), >> to_avro('value).as("value")) >> >> val expectedSchema = StructType(Seq(StructField("name", StringType, >> false),StructField("age", IntegerType, false))) >> >> val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString >> >> val avroTypeStr = s""" >> |{ >> | "type": "int", >> | "name": "key" >> |} >> """.stripMargin >> >> >> dfKV.select(from_avro('key, avroTypeStr)).show >> >> // output >> +-------------------+ >> |from_avro(key, int)| >> +-------------------+ >> | 1| >> | 2| >> +-------------------+ >> >> dfKV.select(from_avro('value, avroTypeStruct)).show >> >> // output >> +---------------------------------------------+ >> |from_avro(value, struct<name:string,age:int>)| >> +---------------------------------------------+ >> | [, 9]| >> | [, 9]| >> +---------------------------------------------+ >> >> Please help and thanks in advance. >> >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- Regards,