Re: Data frame writing

Rajendra Bhat Thu, 12 Jan 2017 22:16:53 -0800

initially my there is no dir, directory which created by spark job. it
should empty while job execution. df write itself create first file and
trying to overwrite it seems.




On Fri, Jan 13, 2017 at 11:42 AM, Amrit Jangid <amrit.jan...@goibibo.com>
wrote:

> Hi Rajendra,
>
> It says your directory is not empty *s3n://**buccketName/cip/daily_date.*
>
> Try to use save *mode. eg *
>
>             df.write.mode(SaveMode.Overwrite).partitionBy("date").f
> ormat("com.databricks.spark.csv").option("delimiter",
> "#").option("codec", "org.apache.hadoop.io.compress
> .GzipCodec").save("s3n://buccketName/cip/daily_date" )
>
>  Hope it helps.
>
> Regards
> Amrit
>
>
>
> On Fri, Jan 13, 2017 at 11:32 AM, Rajendra Bhat <rajhalk...@gmail.com>
> wrote:
>
>> Hi team,
>>
>> I am reading N number of csv and writing file based date partition. date
>> is one column, it has integer value(ex 20170101)
>>
>>
>>          val df = spark.read
>>         .format("com.databricks.spark.csv")
>>         .schema(schema)
>>         .option("delimiter","#")
>>         .option("nullValue","")
>>         .option("treatEmptyValuesAsNulls","true")
>>         .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
>>
>>         .load(filename)
>>             
>> df.write.partitionBy("date").format("com.databricks.spark.csv").option("delimiter",
>> "#").option("codec", "org.apache.hadoop.io.compress
>> .GzipCodec").save("s3n://buccketName/cip/daily_date" )
>>
>> above code troughs bellow error, in middle of execution.
>> s3n://buccketName/cip/daily_date empty location while intilize job.
>>
>> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
>> exists: 
>> s3n://<bucketname>/cip/daily_date/date=20110418/part-r-00082-912033b1-a278-46a8-bf8d-0f97f493e3d8.csv.gz
>>      at 
>> org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:405)
>>      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
>>      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:894)
>>      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:791)
>>      at 
>> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
>>      at 
>> org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVRelation.scala:191)
>>      at 
>> org.apache.spark.sql.execution.datasources.csv.CSVOutputWriterFactory.newInstance(CSVRelation.scala:169)
>>      at 
>> org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
>>
>>  ... 14 more
>>
>> Please suggest why this error is coming and suggest solution
>>
>> Thanks and
>> Regards
>>
>> --
>> Thanks and
>> Regards
>>
>> Rajendra Bhat
>>
>
>
>
> --
>
> Regards,
> Amrit
> Data Team
>



-- 
Thanks and
Regards

Rajendra Bhat

Re: Data frame writing

Reply via email to