On Wed, Aug 12, 2015 at 6:38 PM, Sean P. DeNigris <s...@clipperadams.com> wrote:
> Ben Coman wrote
>> Unless your requirements *specifically* need identical files to be
>> maintained as duplicates, I would strongly consider using something
>> content based like MD5 or SHA
>
> Interesting... There should not be any duplicates. What's the advantage over
> UUID?
>

This... "move to another OS and continue generating [file-ids] with
another VM? "
An sha-hash intrinsically represents THE content (it is *always* the
same no matter who/where/how its calculated), whereas a UUID is a
randomly generated label assigned to the content.

also, depending on use-case, it may facilitate...
* easy to verify whether the file contents have changed.
* periodic checking of backups/restores for file corruption (including
by external tools without reference to indexes maintained by your
Application Image.
* facilitate revision control, if you come across a file whose
filename doesn't match its contents-sha-hash, then you know its
ancestor content by its current file-id.

And for my own use case "some day".... I know have many duplicate
files scattered amongst many adhoc backups. For example, over ten
years several cycles of upgrading to a new PC where the quick-safe
path taken was to copy the old PC hard drive to a subfolder on the new
PC hard drive, but the old hard went into a box now with a dozen
friends, plus duplication of many old small backups media (floppy,
ZIP-Media, tape) that it wold help to consolidate onto several of
todays large media.

cheers -ben

Reply via email to