Re: FileOutputCommitter deadlock 1.3.1

Steve Loughran Tue, 09 Jun 2015 03:47:45 -0700


On 8 Jun 2015, at 15:55, Richard Marscher 
<rmarsc...@localytics.com<mailto:rmarsc...@localytics.com>> wrote:

Hi,

we've been seeing occasional issues in production with the FileOutCommitter
reaching a deadlock situation.

We are writing our data to S3 and currently have speculation enabled. What we
see is that Spark get's a file not found error trying to access a temporary
part file that it wrote (part-#2 file it seems to be every time?), so the task
fails. But the file actually exists in S3 so subsequent speculations and task
retries all fail because the committer tells them the file exists. This will
persist until human intervention kills the application. Usually rerunning the
application will succeed on the next try so it is not deterministic with the
dataset or anything.

It seems like there isn't a good story yet for file writing and speculation
(https://issues.apache.org/jira/browse/SPARK-4879), although our error here
seems worse that reports in that issue since I believe ours deadlocks and those
don't?

S3 isn't a real filesystem, which is why the hadoop docs say "don't turn
speculation on if your destination is s3n/s3a or swift"

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html

1. rename of directories aren't atomic —but rename is exactly speculation uses
to ensure an atomic commit of one tasks output
2. eventual consistency means that things aren't immediately visible
3. when you create an output file for writing, it's not there until the final
close(), so other code that looks for the file existing so as to implement any
locking algorithm is in trouble.

These are fundamental issues which hurt all the bits of Hadoop code that expect
basic POSIX-ish filesystem behaviours of: files immediately visible on create,
atomic and O(1) mv and rmdir, efficient seek(), and consistency of Create,
Update and Delete across the entire cluster.

What to do?
-keep speculation on for output to HDFS; turn it off for s3
-lift the amazon hadoop-aws JAR from EMR (hadoop 2.6+, for earlier versions you
need the whole of hadoop-common)
-look at netflix s3mper
http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
-don't use US East, as it has the worst consistency (it doesn't even support
create consistency). That may help, but it still won't address non-atomic
directory rename

I don't know about SPARK-4879, so can't comment on what's happening there, but
if you are using s3 as your destination it's irrelevant: you will have problems
if speculation is turned on. That's the same as if you use classic Hadoop MR,
Tez or others. It's also why s3 isn't supported as a filesystem for things like
HBase. Hopefully the work on HADOOP-9565 will eventually add some operations to
enable speculative commits for code written explicitly against it.

-Steve

Re: FileOutputCommitter deadlock 1.3.1

Reply via email to