Re: Serializing collections in Datasets

Michael Armbrust Mon, 22 Feb 2016 15:25:38 -0800

I think this will be fixed in 1.6.1.  Can you test when we post the first
RC? (hopefully later today)


On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann <
daniel.siegm...@teamaol.com> wrote:

> Experimenting with datasets in Spark 1.6.0 I ran into a serialization
> error when using case classes containing a Seq member. There is no
> problem when using Array instead. Nor is there a problem using RDD or
> DataFrame (even if converting the DF to a DS later).
>
> Here's an example you can test in the Spark shell:
>
> import sqlContext.implicits._
>
> case class SeqThing(id: String, stuff: Seq[Int])
> val seqThings = Seq(SeqThing("A", Seq()))
> val seqData = sc.parallelize(seqThings)
>
> case class ArrayThing(id: String, stuff: Array[Int])
> val arrayThings = Seq(ArrayThing("A", Array()))
> val arrayData = sc.parallelize(arrayThings)
>
>
> // Array works fine
> arrayData.collect()
> arrayData.toDF.as[ArrayThing]
> arrayData.toDS
>
> // Seq can't convert directly to DS
> seqData.collect()
> seqData.toDF.as[SeqThing]
> seqData.toDS // Serialization exception
>
> Is this working as intended? Are there plans to support serializing
> arbitrary Seq values in datasets, or must everything be converted to Array
> ?
>
> ~Daniel Siegmann
>

Re: Serializing collections in Datasets

Reply via email to