Mehakmeet Singh created HADOOP-18596:
----------------------------------------

             Summary: Distcp -update between different cloud stores to use 
modification time while checking for file skip.
                 Key: HADOOP-18596
                 URL: https://issues.apache.org/jira/browse/HADOOP-18596
             Project: Hadoop Common
          Issue Type: Improvement
          Components: tools/distcp
            Reporter: Mehakmeet Singh
            Assignee: Mehakmeet Singh


Distcp -update currently relies on File size, block size, and Checksum 
comparisons to figure out which files should be skipped or copied. 
Since different cloud stores have different checksum algorithms we should check 
for modification time as well to the checks.

This would ensure that while performing -update if the files are perceived to 
be out of sync we should copy them. The machines between which the file 
transfers occur should be in time sync to avoid any extra copies.

Improving testing and documentation for modification time checks between 
different object stores to ensure no incorrect skipping of files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to