> On 9 Feb 2016, at 07:19, lmk <lakshmi.muralikrish...@gmail.com> wrote: > > Hi Dhimant, > As I had indicated in my next mail, my problem was due to disk getting full > with log messages (these were dumped into the slaves) and did not have > anything to do with the content pushed into s3. So, looks like this error > message is very generic and is thrown for various reasons. You may probably > have to do some more research to find out the cause of your problem.. > Please keep me posted once you fix this issue. Sorry, I could not be of much > help to you.. > > Regards >
that's fun. s3n/s3a buffer their output until close() is called, then they do a full upload this breaks every assumption people have about file IO: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html -especially the bits in the code about close() being fast and harmless; now its O(data) and bad news if it fails. If your close() was failing due to lack of HDD space, it means that your tmp dir and log dir were on the same disk/volume, and that ran out of capacity HADOOP-11183 added an output variant which buffers in memory, primarily for faster output to rack-local storage supporting the s3 protocol. This is in ASF Hadoop 2.7, recent HDP and CDH releases. I don't know if it's in amazon EMR, because they have their own closed source EMR client (believed to be a modified ASF one with some special hooks to unstable s3 APIs) Anyway: I would run, not walk, to using s3a on Hadoop 2.7+, as its already better than s3a and getting better with every release --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org