On 8 Jun 2015, at 15:55, Richard Marscher <rmarsc...@localytics.com<mailto:rmarsc...@localytics.com>> wrote:
Hi, we've been seeing occasional issues in production with the FileOutCommitter reaching a deadlock situation. We are writing our data to S3 and currently have speculation enabled. What we see is that Spark get's a file not found error trying to access a temporary part file that it wrote (part-#2 file it seems to be every time?), so the task fails. But the file actually exists in S3 so subsequent speculations and task retries all fail because the committer tells them the file exists. This will persist until human intervention kills the application. Usually rerunning the application will succeed on the next try so it is not deterministic with the dataset or anything. It seems like there isn't a good story yet for file writing and speculation (https://issues.apache.org/jira/browse/SPARK-4879), although our error here seems worse that reports in that issue since I believe ours deadlocks and those don't? S3 isn't a real filesystem, which is why the hadoop docs say "don't turn speculation on if your destination is s3n/s3a or swift" http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html 1. rename of directories aren't atomic —but rename is exactly speculation uses to ensure an atomic commit of one tasks output 2. eventual consistency means that things aren't immediately visible 3. when you create an output file for writing, it's not there until the final close(), so other code that looks for the file existing so as to implement any locking algorithm is in trouble. These are fundamental issues which hurt all the bits of Hadoop code that expect basic POSIX-ish filesystem behaviours of: files immediately visible on create, atomic and O(1) mv and rmdir, efficient seek(), and consistency of Create, Update and Delete across the entire cluster. What to do? -keep speculation on for output to HDFS; turn it off for s3 -lift the amazon hadoop-aws JAR from EMR (hadoop 2.6+, for earlier versions you need the whole of hadoop-common) -look at netflix s3mper http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html -don't use US East, as it has the worst consistency (it doesn't even support create consistency). That may help, but it still won't address non-atomic directory rename I don't know about SPARK-4879, so can't comment on what's happening there, but if you are using s3 as your destination it's irrelevant: you will have problems if speculation is turned on. That's the same as if you use classic Hadoop MR, Tez or others. It's also why s3 isn't supported as a filesystem for things like HBase. Hopefully the work on HADOOP-9565 will eventually add some operations to enable speculative commits for code written explicitly against it. -Steve