Writing empty Dataframes doesn't save any _metadata files in Spark 1.5.1 and 1.6

antoniosi Tue, 14 Jun 2016 16:47:13 -0700

I tried the following code in both Spark 1.5.1 and Spark 1.6.0:

import org.apache.spark.sql.types.{
    StructType, StructField, StringType, IntegerType}
import org.apache.spark.sql.Row


val schema = StructType(
    StructField("k", StringType, true) ::
    StructField("v", IntegerType, false) :: Nil)

sqlContext.createDataFrame(sc.emptyRDD[Row], schema)
df.write.save("hdfs://xxx")

Both 1.5.1 and 1.6.0 only save _SUCCESS file. It does not save any _metadata
files. Also, in 1.6.0, it also gives the following error:

16/06/14 16:29:27 WARN ParquetOutputCommitter: could not write summary file
for hdfs://xxx
java.lang.NullPointerException
        at
org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:456)
        at
org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420)
        at
org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
        at
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
        at
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
        at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
        at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
        at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)

I do not get this exception in 1.5.1 version though.

I see this bug https://issues.apache.org/jira/browse/SPARK-15393, but this
is for Spark 2.0. Is there a same bug in Spark 1.5.1 and 1.6?

Is there a way we could save an empty dataframe properly?

Thanks.

Antonio.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-empty-Dataframes-doesn-t-save-any-metadata-files-in-Spark-1-5-1-and-1-6-tp27169.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Writing empty Dataframes doesn't save any _metadata files in Spark 1.5.1 and 1.6

Reply via email to