On Feb 26, 2009, at 4:14 PM, Brian Long wrote:
What kind of atomicity/visibility claims are made regarding the
various
operations on a FileSystem?
I have multiple processes that write into local sequence files, then
uploads
them into a remote directory in HDFS. A map/reduce job runs which
operates
on whatever is in the directory. The processes are not synchronized
with the
job, so it is entirely possible that the job might start as a file
is being
uploaded. Thus, my concern is that the job may include a partially
uploaded
file if "FileSystem.copyFromLocalFile" is not atomic (in the sense
that the
file will not appear until all bytes are written).
Hey Brian,
I can't speak for knowing about the whole file system, but I do know
that, like you'd expect in Unix, open files which are being written to
are visible.
Are any of the FileSystem API's atomic in this sense? What about, at
the
very least, rename (e.g. first write to a temp hdfs location, then use
rename to atomically flip the file into the live directory)?
I'm not sure on this one; I suspect you're safe here.
Brian