Hi, I was dealing with avro stuff lately and most of the time it has something to do with the schema. One thing I've pinpointed quickly (where I was struggling also) is the name field should be nullable but the result is not yet correct so further digging needed...
scala> val expectedSchema = StructType(Seq(StructField("name", StringType,true),StructField("age", IntegerType, false))) expectedSchema: org.apache.spark.sql.types.StructType = StructType(StructField(name,StringType,true), StructField(age,IntegerType,false)) scala> val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString avroTypeStruct: String = {"type":"record","name":"topLevelRecord","fields":[{"name":"name","type":["string","null"]},{"name":"age","type":"int"}]} scala> dfKV.select(from_avro('value, avroTypeStruct)).show +---------------------------------------------+ |from_avro(value, struct<name:string,age:int>)| +---------------------------------------------+ | [Mary Jane, 25]| | [Mary Jane, 25]| +---------------------------------------------+ BR, G On Wed, Feb 27, 2019 at 7:43 AM Hien Luu <hien...@gmail.com> wrote: > Hi, > > I ran into a pretty weird issue with to_avro and from_avro where it was not > able to parse the data in a struct correctly. Please see the simple and > self contained example below. I am using Spark 2.4. I am not sure if I > missed something. > > This is how I start the spark-shell on my Mac: > > ./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0 > > import org.apache.spark.sql.types._ > import org.apache.spark.sql.avro._ > import org.apache.spark.sql.functions._ > > > spark.version > > val df = Seq((1, "John Doe", 30), (2, "Mary Jane", 25)).toDF("id", "name", > "age") > > val dfStruct = df.withColumn("value", struct("name","age")) > > dfStruct.show > dfStruct.printSchema > > val dfKV = dfStruct.select(to_avro('id).as("key"), > to_avro('value).as("value")) > > val expectedSchema = StructType(Seq(StructField("name", StringType, > false),StructField("age", IntegerType, false))) > > val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString > > val avroTypeStr = s""" > |{ > | "type": "int", > | "name": "key" > |} > """.stripMargin > > > dfKV.select(from_avro('key, avroTypeStr)).show > > // output > +-------------------+ > |from_avro(key, int)| > +-------------------+ > | 1| > | 2| > +-------------------+ > > dfKV.select(from_avro('value, avroTypeStruct)).show > > // output > +---------------------------------------------+ > |from_avro(value, struct<name:string,age:int>)| > +---------------------------------------------+ > | [, 9]| > | [, 9]| > +---------------------------------------------+ > > Please help and thanks in advance. > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >