task_xxx folder on worker nodes

Hemanth Gudela Thu, 10 Aug 2017 00:59:21 -0700

Thanks for reply Femi!

I’m writing the file like this --> 
myDataFrame.write.mode("overwrite").csv("myFilePath")
There absolutely are no errors/warnings after the write.


_SUCCESS file is created on master node, but the problem of _temporary is 
noticed only on worked nodes.

I know spark.write.csv works best with HDFS, but with the current setup I have 
in my environment, I have to deal with spark write to node’s local file system 
and not to HDFS.

Regards,
Hemanth

From: Femi Anthony <[email protected]>
Date: Thursday, 10 August 2017 at 10.38
To: Hemanth Gudela <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: spark.write.csv is not able write files to specified path, but is 
writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

Normally the _temporary directory gets deleted as part of the cleanup when the 
write is complete and a SUCCESS file is created. I suspect that the writes are 
not properly completed. How are you specifying the write ? Any error messages 
in the logs ?

On Thu, Aug 10, 2017 at 3:17 AM, Hemanth Gudela 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I’m running spark on cluster mode containing 4 nodes, and trying to write CSV 
files to node’s local path (not HDFS).
I’m spark.write.csv to write CSV files.

On master node:
spark.write.csv creates a folder with csv file name and writes many files with 
part-r-000n suffix. This is okay for me, I can merge them later.
But on worker nodes:
                spark.write.csv creates a folder with csv file name and writes 
many folders and files under _temporary/0/. This is not okay for me.
Could someone please suggest me what could have been going wrong in my 
settings/how to be able to write csv files to the specified folder, and not to 
subfolders (_temporary/0/task_xxx) in worker machines.

Thank you,
Hemanth




--
http://www.femibyte.com/twiki5/bin/view/Tech/
http://www.nextmatrix.com
"Great spirits have always encountered violent opposition from mediocre minds." 
- Albert Einstein.

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

Reply via email to