On 8 Jun 2015, at 15:55, Richard Marscher 
<rmarsc...@localytics.com<mailto:rmarsc...@localytics.com>> wrote:

Hi,

we've been seeing occasional issues in production with the FileOutCommitter 
reaching a deadlock situation.

We are writing our data to S3 and currently have speculation enabled. What we 
see is that Spark get's a file not found error trying to access a temporary 
part file that it wrote (part-#2 file it seems to be every time?), so the task 
fails. But the file actually exists in S3 so subsequent speculations and task 
retries all fail because the committer tells them the file exists. This will 
persist until human intervention kills the application. Usually rerunning the 
application will succeed on the next try so it is not deterministic with the 
dataset or anything.

It seems like there isn't a good story yet for file writing and speculation 
(https://issues.apache.org/jira/browse/SPARK-4879), although our error here 
seems worse that reports in that issue since I believe ours deadlocks and those 
don't?

S3 isn't a real filesystem, which is why the hadoop docs say "don't turn 
speculation on if your destination is s3n/s3a or swift"

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html

1. rename of directories aren't atomic —but rename is exactly speculation uses 
to ensure an atomic commit of one tasks output
2. eventual consistency means that things aren't immediately visible
3. when you create an output file for writing, it's not there until the final 
close(), so other code that looks for the file existing so as to implement any 
locking algorithm is in trouble.

These are fundamental issues which hurt all the bits of Hadoop code that expect 
basic POSIX-ish filesystem behaviours of: files immediately visible on create, 
atomic and O(1) mv and rmdir, efficient seek(), and consistency of Create, 
Update and Delete across the entire cluster.

What to do?
-keep speculation on for output to HDFS; turn it off for s3
-lift the amazon hadoop-aws JAR from EMR (hadoop 2.6+, for earlier versions you 
need the whole of hadoop-common)
-look at netflix s3mper 
http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
-don't use US East, as it has the worst consistency (it doesn't even support 
create consistency). That may help, but it still won't address non-atomic 
directory rename

I don't know about SPARK-4879, so can't comment on what's happening there, but 
if you are using s3 as your destination it's irrelevant: you will have problems 
if speculation is turned on. That's the same as if you use classic Hadoop MR, 
Tez or others. It's also why s3 isn't supported as a filesystem for things like 
HBase. Hopefully the work on HADOOP-9565 will eventually add some operations to 
enable speculative commits for code written explicitly against it.

-Steve


Reply via email to