Thanks for looking into this.  Does this mean string fields should alway be
nullable?

You are right that the result is not yet correct and further digging is
needed :(

On Wed, Feb 27, 2019 at 1:19 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Hi,
>
> I was dealing with avro stuff lately and most of the time it has something
> to do with the schema.
> One thing I've pinpointed quickly (where I was struggling also) is the
> name field should be nullable but the result is not yet correct so further
> digging needed...
>
> scala> val expectedSchema = StructType(Seq(StructField("name",
> StringType,true),StructField("age", IntegerType, false)))
> expectedSchema: org.apache.spark.sql.types.StructType =
> StructType(StructField(name,StringType,true),
> StructField(age,IntegerType,false))
>
> scala> val avroTypeStruct =
> SchemaConverters.toAvroType(expectedSchema).toString
> avroTypeStruct: String =
> {"type":"record","name":"topLevelRecord","fields":[{"name":"name","type":["string","null"]},{"name":"age","type":"int"}]}
>
> scala> dfKV.select(from_avro('value, avroTypeStruct)).show
> +---------------------------------------------+
> |from_avro(value, struct<name:string,age:int>)|
> +---------------------------------------------+
> |                              [Mary Jane, 25]|
> |                              [Mary Jane, 25]|
> +---------------------------------------------+
>
> BR,
> G
>
>
> On Wed, Feb 27, 2019 at 7:43 AM Hien Luu <hien...@gmail.com> wrote:
>
>> Hi,
>>
>> I ran into a pretty weird issue with to_avro and from_avro where it was
>> not
>> able to parse the data in a struct correctly.  Please see the simple and
>> self contained example below. I am using Spark 2.4.  I am not sure if I
>> missed something.
>>
>> This is how I start the spark-shell on my Mac:
>>
>> ./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0
>>
>> import org.apache.spark.sql.types._
>> import org.apache.spark.sql.avro._
>> import org.apache.spark.sql.functions._
>>
>>
>> spark.version
>>
>> val df = Seq((1, "John Doe",  30), (2, "Mary Jane", 25)).toDF("id",
>> "name",
>> "age")
>>
>> val dfStruct = df.withColumn("value", struct("name","age"))
>>
>> dfStruct.show
>> dfStruct.printSchema
>>
>> val dfKV = dfStruct.select(to_avro('id).as("key"),
>> to_avro('value).as("value"))
>>
>> val expectedSchema = StructType(Seq(StructField("name", StringType,
>> false),StructField("age", IntegerType, false)))
>>
>> val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString
>>
>> val avroTypeStr = s"""
>>       |{
>>       |  "type": "int",
>>       |  "name": "key"
>>       |}
>>     """.stripMargin
>>
>>
>> dfKV.select(from_avro('key, avroTypeStr)).show
>>
>> // output
>> +-------------------+
>> |from_avro(key, int)|
>> +-------------------+
>> |                  1|
>> |                  2|
>> +-------------------+
>>
>> dfKV.select(from_avro('value, avroTypeStruct)).show
>>
>> // output
>> +---------------------------------------------+
>> |from_avro(value, struct<name:string,age:int>)|
>> +---------------------------------------------+
>> |                                        [, 9]|
>> |                                        [, 9]|
>> +---------------------------------------------+
>>
>> Please help and thanks in advance.
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

-- 
Regards,

Reply via email to