Hi, Have you check the related JIRA? e.g., https://issues.apache.org/jira/browse/SPARK-19950 If you have any ask and request, you'd better to do there.
Thanks! // maropu On Tue, Mar 21, 2017 at 6:30 AM, Jason White <jason.wh...@shopify.com> wrote: > If I create a dataframe in Spark with non-nullable columns, and then save > that to disk as a Parquet file, the columns are properly marked as > non-nullable. I confirmed this using parquet-tools. Then, when loading it > back, Spark forces the nullable back to True. > > https://github.com/apache/spark/blob/master/sql/core/ > src/main/scala/org/apache/spark/sql/execution/ > datasources/DataSource.scala#L378 > > If I remove the `.asNullable` part, Spark performs exactly as I'd like by > default, picking up the data using the schema either in the Parquet file or > provided by me. > > This particular LoC goes back a year now, and I've seen a variety of > discussions about this issue. In particular with Michael here: > https://www.mail-archive.com/user@spark.apache.org/msg39230.html. Those > seemed to be discussing writing, not reading, though, and writing is > already > supported now. > > Is this functionality still desirable? Is it potentially not applicable for > all file formats and situations (e.g. HDFS/Parquet)? Would it be suitable > to > pass an option to the DataFrameReader to disable this functionality? > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Why-are-DataFrames- > always-read-with-nullable-True-tp21207.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro