initially my there is no dir, directory which created by spark job. it should empty while job execution. df write itself create first file and trying to overwrite it seems.
On Fri, Jan 13, 2017 at 11:42 AM, Amrit Jangid <amrit.jan...@goibibo.com> wrote: > Hi Rajendra, > > It says your directory is not empty *s3n://**buccketName/cip/daily_date.* > > Try to use save *mode. eg * > > df.write.mode(SaveMode.Overwrite).partitionBy("date").f > ormat("com.databricks.spark.csv").option("delimiter", > "#").option("codec", "org.apache.hadoop.io.compress > .GzipCodec").save("s3n://buccketName/cip/daily_date" ) > > Hope it helps. > > Regards > Amrit > > > > On Fri, Jan 13, 2017 at 11:32 AM, Rajendra Bhat <rajhalk...@gmail.com> > wrote: > >> Hi team, >> >> I am reading N number of csv and writing file based date partition. date >> is one column, it has integer value(ex 20170101) >> >> >> val df = spark.read >> .format("com.databricks.spark.csv") >> .schema(schema) >> .option("delimiter","#") >> .option("nullValue","") >> .option("treatEmptyValuesAsNulls","true") >> .option("codec", "org.apache.hadoop.io.compress.GzipCodec") >> >> .load(filename) >> >> df.write.partitionBy("date").format("com.databricks.spark.csv").option("delimiter", >> "#").option("codec", "org.apache.hadoop.io.compress >> .GzipCodec").save("s3n://buccketName/cip/daily_date" ) >> >> above code troughs bellow error, in middle of execution. >> s3n://buccketName/cip/daily_date empty location while intilize job. >> >> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already >> exists: >> s3n://<bucketname>/cip/daily_date/date=20110418/part-r-00082-912033b1-a278-46a8-bf8d-0f97f493e3d8.csv.gz >> at >> org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:405) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:894) >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:791) >> at >> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) >> at >> org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVRelation.scala:191) >> at >> org.apache.spark.sql.execution.datasources.csv.CSVOutputWriterFactory.newInstance(CSVRelation.scala:169) >> at >> org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) >> >> ... 14 more >> >> Please suggest why this error is coming and suggest solution >> >> Thanks and >> Regards >> >> -- >> Thanks and >> Regards >> >> Rajendra Bhat >> > > > > -- > > Regards, > Amrit > Data Team > -- Thanks and Regards Rajendra Bhat