Thanks a lot, Chris, this is helpful.

On Wed, May 25, 2016 at 12:33 PM, Chris Nauroth <cnaur...@hortonworks.com>
wrote:

> Hello Kun,
>
> You are correct that "hdfs dfs -cp" is not atomic, but the details of that
> are a bit different from what you described.  For the example you gave,
> the sequence of events would be:
>
> 1. Open a.xml.
> 2. Create file b.xml._COPYING_.
> 3. Copy the bytes from a.xml to b.xml._COPYING_.
> 4. Rename b.xml._COPYING_ to b.xml.
>
> b.xml._COPYING_ is a temporary file.  All the bytes are written to this
> location first.  Only if the full copy is successful, it proceeds to step
> 4 to rename it to its final destination at b.xml.  The rename is atomic,
> so overall, this has the effect that b.xml will never have
> partially-written data.  Either the whole copy succeeds or the copy fails
> and b.xml doesn't exist.
>
> However, even though the rename is atomic, we can't claim the overall
> operation is atomic.  For example, if the process dies between step 2 and
> step 3, then the command leaves a lingering side effect in the form of the
> b.xml._COPYING_ file.
>
> Perhaps it's sufficient for your use case that the final rename step is
> atomic.
>
> --Chris Nauroth
>
>
>
>
> On 5/25/16, 8:21 AM, "Kun Ren" <ren.h...@gmail.com> wrote:
>
> >Hi Genius,
> >
> >If I understand correctly, the shell command "cp" for the HDFS is not
> >atomic, is that correct?
> >
> >For example:
> >
> >./bin/hdfs dfs -cp input/a.xml input/b.xml
> >
> >This command actually does 3 things, 1. read input/a.xml; 2. Create a new
> >file input/b.xml; 3. Write the content of a.xml to b.xml;
> >
> >When I looked at the code, and the client side actually does the 3 steps
> >and there are no lock between the 3 step, does it mean that the cp command
> >is not guaranteed atomic?
> >
> >
> >Thanks a lot for your reply.
>
>

Reply via email to