On Wed, Aug 12, 2015 at 6:38 PM, Sean P. DeNigris <s...@clipperadams.com> wrote: > Ben Coman wrote >> Unless your requirements *specifically* need identical files to be >> maintained as duplicates, I would strongly consider using something >> content based like MD5 or SHA > > Interesting... There should not be any duplicates. What's the advantage over > UUID? >
This... "move to another OS and continue generating [file-ids] with another VM? " An sha-hash intrinsically represents THE content (it is *always* the same no matter who/where/how its calculated), whereas a UUID is a randomly generated label assigned to the content. also, depending on use-case, it may facilitate... * easy to verify whether the file contents have changed. * periodic checking of backups/restores for file corruption (including by external tools without reference to indexes maintained by your Application Image. * facilitate revision control, if you come across a file whose filename doesn't match its contents-sha-hash, then you know its ancestor content by its current file-id. And for my own use case "some day".... I know have many duplicate files scattered amongst many adhoc backups. For example, over ten years several cycles of upgrading to a new PC where the quick-safe path taken was to copy the old PC hard drive to a subfolder on the new PC hard drive, but the old hard went into a box now with a dozen friends, plus duplication of many old small backups media (floppy, ZIP-Media, tape) that it wold help to consolidate onto several of todays large media. cheers -ben