Hi,

I ran into a pretty weird issue with to_avro and from_avro where it was not
able to parse the data in a struct correctly.  Please see the simple and
self contained example below. I am using Spark 2.4.  I am not sure if I
missed something.

This is how I start the spark-shell on my Mac:

./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0

import org.apache.spark.sql.types._
import org.apache.spark.sql.avro._
import org.apache.spark.sql.functions._


spark.version

val df = Seq((1, "John Doe",  30), (2, "Mary Jane", 25)).toDF("id", "name",
"age")

val dfStruct = df.withColumn("value", struct("name","age"))

dfStruct.show
dfStruct.printSchema

val dfKV = dfStruct.select(to_avro('id).as("key"),
to_avro('value).as("value"))

val expectedSchema = StructType(Seq(StructField("name", StringType,
false),StructField("age", IntegerType, false)))

val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString

val avroTypeStr = s"""
      |{
      |  "type": "int",
      |  "name": "key"
      |}
    """.stripMargin


dfKV.select(from_avro('key, avroTypeStr)).show

// output
+-------------------+
|from_avro(key, int)|
+-------------------+
|                  1|
|                  2|
+-------------------+

dfKV.select(from_avro('value, avroTypeStruct)).show

// output
+---------------------------------------------+
|from_avro(value, struct<name:string,age:int>)|
+---------------------------------------------+
|                                        [, 9]|
|                                        [, 9]|
+---------------------------------------------+

Please help and thanks in advance.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to