Re: Writing Dataframe to CSV yields blank file called "_SUCCESS"

2016-09-26 Thread Peter Figliozzi
Thanks again Piotr. It's good to know there are a number of options. Once again I'm glad I put all my workers on the same ethernet switch, as unanticipated shuffling isn't so bad. Sincerely, Pete On Mon, Sep 26, 2016 at 8:35 AM, Piotr Smoliński < piotr.smolinski...@gmail.com> wrote: > Best, you

Re: Writing Dataframe to CSV yields blank file called "_SUCCESS"

2016-09-26 Thread Piotr Smoliński
Best, you should write to HDFS or when you test the product with no HDFS available just create a shared filesystem (windows shares, nfs, etc.) where the data will be written. You'll still end up with many files, but this time there will be only one directory tree. You may reduce the number of fil

Re: Writing Dataframe to CSV yields blank file called "_SUCCESS"

2016-09-26 Thread Peter Figliozzi
Thank you Piotr, that's what happened. In fact, there are about 100 files on each worker node in a directory corresponding to the write. Any way to tone that down a bit (maybe 1 file per worker)? Or, write a single file somewhere? On Mon, Sep 26, 2016 at 12:44 AM, Piotr Smoliński < piotr.smoli

Re: Writing Dataframe to CSV yields blank file called "_SUCCESS"

2016-09-25 Thread Piotr Smoliński
Hi Peter, The blank file _SUCCESS indicates properly finished output operation. What is the topology of your application? I presume, you write to local filesystem and have more than one worker machine. In such case Spark will write the result files for each partition (in the worker which holds it