I think this will be fixed in 1.6.1. Can you test when we post the first RC? (hopefully later today)
On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Experimenting with datasets in Spark 1.6.0 I ran into a serialization > error when using case classes containing a Seq member. There is no > problem when using Array instead. Nor is there a problem using RDD or > DataFrame (even if converting the DF to a DS later). > > Here's an example you can test in the Spark shell: > > import sqlContext.implicits._ > > case class SeqThing(id: String, stuff: Seq[Int]) > val seqThings = Seq(SeqThing("A", Seq())) > val seqData = sc.parallelize(seqThings) > > case class ArrayThing(id: String, stuff: Array[Int]) > val arrayThings = Seq(ArrayThing("A", Array())) > val arrayData = sc.parallelize(arrayThings) > > > // Array works fine > arrayData.collect() > arrayData.toDF.as[ArrayThing] > arrayData.toDS > > // Seq can't convert directly to DS > seqData.collect() > seqData.toDF.as[SeqThing] > seqData.toDS // Serialization exception > > Is this working as intended? Are there plans to support serializing > arbitrary Seq values in datasets, or must everything be converted to Array > ? > > ~Daniel Siegmann >