Re: Why are DataFrames always read with nullable=True?

Takeshi Yamamuro Mon, 20 Mar 2017 18:58:56 -0700

Hi,

Have you check the related JIRA? e.g.,
https://issues.apache.org/jira/browse/SPARK-19950
If you have any ask and request, you'd better to do there.


Thanks!

// maropu


On Tue, Mar 21, 2017 at 6:30 AM, Jason White <jason.wh...@shopify.com>
wrote:

> If I create a dataframe in Spark with non-nullable columns, and then save
> that to disk as a Parquet file, the columns are properly marked as
> non-nullable. I confirmed this using parquet-tools. Then, when loading it
> back, Spark forces the nullable back to True.
>
> https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/execution/
> datasources/DataSource.scala#L378
>
> If I remove the `.asNullable` part, Spark performs exactly as I'd like by
> default, picking up the data using the schema either in the Parquet file or
> provided by me.
>
> This particular LoC goes back a year now, and I've seen a variety of
> discussions about this issue. In particular with Michael here:
> https://www.mail-archive.com/user@spark.apache.org/msg39230.html. Those
> seemed to be discussing writing, not reading, though, and writing is
> already
> supported now.
>
> Is this functionality still desirable? Is it potentially not applicable for
> all file formats and situations (e.g. HDFS/Parquet)? Would it be suitable
> to
> pass an option to the DataFrameReader to disable this functionality?
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Why-are-DataFrames-
> always-read-with-nullable-True-tp21207.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro

Re: Why are DataFrames always read with nullable=True?

Reply via email to