Thanks again Piotr. It's good to know there are a number of options. Once
again I'm glad I put all my workers on the same ethernet switch, as
unanticipated shuffling isn't so bad.
Sincerely,
Pete
On Mon, Sep 26, 2016 at 8:35 AM, Piotr Smoliński <
piotr.smolinski...@gmail.com> wrote:
> Best, you
Best, you should write to HDFS or when you test the product with no HDFS
available just create a shared
filesystem (windows shares, nfs, etc.) where the data will be written.
You'll still end up with many files, but this time there will be only one
directory tree.
You may reduce the number of fil
Thank you Piotr, that's what happened. In fact, there are about 100 files
on each worker node in a directory corresponding to the write.
Any way to tone that down a bit (maybe 1 file per worker)? Or, write a
single file somewhere?
On Mon, Sep 26, 2016 at 12:44 AM, Piotr Smoliński <
piotr.smoli
Hi Peter,
The blank file _SUCCESS indicates properly finished output operation.
What is the topology of your application?
I presume, you write to local filesystem and have more than one worker
machine.
In such case Spark will write the result files for each partition (in the
worker which
holds it