On Feb 26, 2009, at 4:14 PM, Brian Long wrote:

What kind of atomicity/visibility claims are made regarding the various
operations on a FileSystem?
I have multiple processes that write into local sequence files, then uploads them into a remote directory in HDFS. A map/reduce job runs which operates on whatever is in the directory. The processes are not synchronized with the job, so it is entirely possible that the job might start as a file is being uploaded. Thus, my concern is that the job may include a partially uploaded file if "FileSystem.copyFromLocalFile" is not atomic (in the sense that the
file will not appear until all bytes are written).

Hey Brian,

I can't speak for knowing about the whole file system, but I do know that, like you'd expect in Unix, open files which are being written to are visible.



Are any of the FileSystem API's atomic in this sense? What about, at the
very least, rename (e.g. first write to a temp hdfs location, then use
rename to atomically flip the file into the live directory)?


I'm not sure on this one; I suspect you're safe here.

Brian

Reply via email to