Hi Efe, If my understanding is correct, supporting to write/read complex types is not supported because CSV format can't represent the nested types in its own format.
I guess supporting them in writing in external CSV is rather a bug. I think it'd be great if we can write and read back CSV in its own format but I guess we can't. Thanks! On 19 Aug 2016 6:33 a.m., "Efe Selcuk" <efema...@gmail.com> wrote: > We have an application working in Spark 1.6. It uses the databricks csv > library for the output format when writing out. > > I'm attempting an upgrade to Spark 2. When writing with both the native > DataFrameWriter#csv() method and with first specifying the > "com.databricks.spark.csv" format (I suspect underlying format is the same > but I don't know how to verify), I get the following error: > > java.lang.UnsupportedOperationException: CSV data source does not support > struct<[bunch of field names and types]> data type > > There are 20 fields, mostly plain strings with a couple of dates. The > source object is a Dataset[T] where T is a case class with various fields > The line just looks like: someDataset.write.csv(outputPath) > > Googling returned this fairly recent pull request: https://mail-archives > .apache.org/mod_mbox/spark-commits/201605.mbox/%3C65d35a7 > 2bd05483392857098a2635...@git.apache.org%3E > > If I'm reading that correctly, the schema shows that each record has one > field of this complex struct type? And the validation thinks it's something > that it can't serialize. I would expect the schema to have a bunch of > fields in it matching the case class, so maybe there's something I'm > misunderstanding. > > Efe >