> On 9 Feb 2016, at 07:19, lmk <lakshmi.muralikrish...@gmail.com> wrote:
> 
> Hi Dhimant,
> As I had indicated in my next mail, my problem was due to disk getting full
> with log messages (these were dumped into the slaves) and did not have
> anything to do with the content pushed into s3. So, looks like this error
> message is very generic and is thrown for various reasons. You may probably
> have to do some more research to find out the cause of your problem..
> Please keep me posted once you fix this issue. Sorry, I could not be of much
> help to you..
> 
> Regards
> 

that's fun.

s3n/s3a buffer their output until close() is called, then they do a full upload

this breaks every assumption people have about file IO:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html

-especially the bits in the code about close() being fast and harmless; now its 
O(data) and bad news if it fails.

If your close() was failing due to lack of HDD space, it means that your tmp 
dir and log dir were on the same disk/volume, and that ran out of capacity

HADOOP-11183 added an output variant which buffers in memory, primarily for 
faster output to rack-local storage supporting the s3 protocol. This is in ASF 
Hadoop 2.7, recent HDP and CDH releases. 

I don't know if it's in amazon EMR, because they have their own closed source 
EMR client (believed to be a modified ASF one with some special hooks to 
unstable s3 APIs)

Anyway: I would run, not walk, to using s3a on Hadoop 2.7+, as its already 
better than s3a and getting better with every release

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to