Thanks a lot, Chris, this is helpful. On Wed, May 25, 2016 at 12:33 PM, Chris Nauroth <cnaur...@hortonworks.com> wrote:
> Hello Kun, > > You are correct that "hdfs dfs -cp" is not atomic, but the details of that > are a bit different from what you described. For the example you gave, > the sequence of events would be: > > 1. Open a.xml. > 2. Create file b.xml._COPYING_. > 3. Copy the bytes from a.xml to b.xml._COPYING_. > 4. Rename b.xml._COPYING_ to b.xml. > > b.xml._COPYING_ is a temporary file. All the bytes are written to this > location first. Only if the full copy is successful, it proceeds to step > 4 to rename it to its final destination at b.xml. The rename is atomic, > so overall, this has the effect that b.xml will never have > partially-written data. Either the whole copy succeeds or the copy fails > and b.xml doesn't exist. > > However, even though the rename is atomic, we can't claim the overall > operation is atomic. For example, if the process dies between step 2 and > step 3, then the command leaves a lingering side effect in the form of the > b.xml._COPYING_ file. > > Perhaps it's sufficient for your use case that the final rename step is > atomic. > > --Chris Nauroth > > > > > On 5/25/16, 8:21 AM, "Kun Ren" <ren.h...@gmail.com> wrote: > > >Hi Genius, > > > >If I understand correctly, the shell command "cp" for the HDFS is not > >atomic, is that correct? > > > >For example: > > > >./bin/hdfs dfs -cp input/a.xml input/b.xml > > > >This command actually does 3 things, 1. read input/a.xml; 2. Create a new > >file input/b.xml; 3. Write the content of a.xml to b.xml; > > > >When I looked at the code, and the client side actually does the 3 steps > >and there are no lock between the 3 step, does it mean that the cp command > >is not guaranteed atomic? > > > > > >Thanks a lot for your reply. > >