Hi team,

I am reading N number of csv and writing file based date partition. date is
one column, it has integer value(ex 20170101)


         val df = spark.read
        .format("com.databricks.spark.csv")
        .schema(schema)
        .option("delimiter","#")
        .option("nullValue","")
        .option("treatEmptyValuesAsNulls","true")
        .option("codec", "org.apache.hadoop.io.compress.GzipCodec")

        .load(filename)
            
df.write.partitionBy("date").format("com.databricks.spark.csv").option("delimiter",
"#").option("codec", "org.apache.hadoop.io.compress.GzipCodec").save("
s3n://buccketName/cip/daily_date" )

above code troughs bellow error, in middle of execution.
s3n://buccketName/cip/daily_date empty location while intilize job.

Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File
already exists:
s3n://<bucketname>/cip/daily_date/date=20110418/part-r-00082-912033b1-a278-46a8-bf8d-0f97f493e3d8.csv.gz
        at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:405)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:894)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:791)
        at 
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
        at 
org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVRelation.scala:191)
        at 
org.apache.spark.sql.execution.datasources.csv.CSVOutputWriterFactory.newInstance(CSVRelation.scala:169)
        at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)

 ... 14 more

Please suggest why this error is coming and suggest solution

Thanks and
Regards

-- 
Thanks and
Regards

Rajendra Bhat

Reply via email to