Re: [Spark2] Error writing "complex" type to CSV

Hyukjin Kwon Thu, 18 Aug 2016 17:08:38 -0700

Hi Efe,

If my understanding is correct, supporting to write/read complex types is
not supported because CSV format can't represent the nested types in its
own format.


I guess supporting them in writing in external CSV is rather a bug.

I think it'd be great if we can write and read back CSV in its own format
but I guess we can't.

Thanks!

On 19 Aug 2016 6:33 a.m., "Efe Selcuk" <efema...@gmail.com> wrote:

> We have an application working in Spark 1.6. It uses the databricks csv
> library for the output format when writing out.
>
> I'm attempting an upgrade to Spark 2. When writing with both the native
> DataFrameWriter#csv() method and with first specifying the
> "com.databricks.spark.csv" format (I suspect underlying format is the same
> but I don't know how to verify), I get the following error:
>
> java.lang.UnsupportedOperationException: CSV data source does not support
> struct<[bunch of field names and types]> data type
>
> There are 20 fields, mostly plain strings with a couple of dates. The
> source object is a Dataset[T] where T is a case class with various fields
> The line just looks like: someDataset.write.csv(outputPath)
>
> Googling returned this fairly recent pull request: https://mail-archives
> .apache.org/mod_mbox/spark-commits/201605.mbox/%3C65d35a7
> 2bd05483392857098a2635...@git.apache.org%3E
>
> If I'm reading that correctly, the schema shows that each record has one
> field of this complex struct type? And the validation thinks it's something
> that it can't serialize. I would expect the schema to have a bunch of
> fields in it matching the case class, so maybe there's something I'm
> misunderstanding.
>
> Efe
>

Re: [Spark2] Error writing "complex" type to CSV

Reply via email to