Hi team, I am reading N number of csv and writing file based date partition. date is one column, it has integer value(ex 20170101)
val df = spark.read .format("com.databricks.spark.csv") .schema(schema) .option("delimiter","#") .option("nullValue","") .option("treatEmptyValuesAsNulls","true") .option("codec", "org.apache.hadoop.io.compress.GzipCodec") .load(filename) df.write.partitionBy("date").format("com.databricks.spark.csv").option("delimiter", "#").option("codec", "org.apache.hadoop.io.compress.GzipCodec").save(" s3n://buccketName/cip/daily_date" ) above code troughs bellow error, in middle of execution. s3n://buccketName/cip/daily_date empty location while intilize job. Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already exists: s3n://<bucketname>/cip/daily_date/date=20110418/part-r-00082-912033b1-a278-46a8-bf8d-0f97f493e3d8.csv.gz at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:405) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:894) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:791) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVRelation.scala:191) at org.apache.spark.sql.execution.datasources.csv.CSVOutputWriterFactory.newInstance(CSVRelation.scala:169) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) ... 14 more Please suggest why this error is coming and suggest solution Thanks and Regards -- Thanks and Regards Rajendra Bhat