On 7/7/23 11:42 AM, home user wrote:
When I try to verify a back-up, I use "diff -r".  The directory trees being compared contain about 
870 files (mostly binary, like PNG, JPG, and so on), and take up about 707 megabytes.  The trees being 
compared are on the hard drive and on a USB-3 stick.  When I run the "diff -r" command, it seems to 
finish too quickly - it seems like less than a half of a second.  I saw similar results a few weeks ago 
comparing about 30 gigabyte trees on the hard drive vs. on a USB-3.1 stick; the results were practically 
instantaneous.  Is diff actually checking every bit (or byte), or is it using some "short cut"?

Before opening this thread, I had already spent a lot of time and effort verifying that 
"diff" worked correctly on binary files.  The issue was that diff seemed to compare large 
directory trees of files too quickly, which led me to believe it was using a "short cut" 
rather than actually comparing file contents.  I believe that some short cuts should be used:
* files of different sizes should be reported as being different without 
comparing contents.
* once one bit is found to differ between two files, they should reported as 
different without comparing the remaining contents.
But contents should be compared even if two files have the same name, sizes, 
creation/modification histories, permissions, and other meta-data values.  This 
was not happening.

Ron's tests showed that my suspicion was correct: an inappropriate (in my 
opinion) short cut was being used.  So, Roberto, short cuts are sometimes used. 
 Ron also provided the solution, doing as root:
      sync ; echo 3 > /proc/sys/vm/drop_caches

Patrick's point that diff wasn't meant for binary files is correct, but without a 
recursive option, cmp doesn't really help unless I want to write a script to do the 
recursive traversal of the 2 trees, calling cmp on every file that's in both trees.  I 
struggle with recursion; trying it makes me curse and re-curse and curse yet more.  
Patrick's suggestion to use rsync is a good one.  Robert's suggestion to use the 
"-c" option is also good.  But wikipedia claims that checksums are not perfect, 
that it is remotely possible for files with identical checksums to differ.

Years ago, when I worked on the AWIPS program at the National Weather Service, 
I needed a file restored from the regular back-up done by the sys.admins..  
They couldn't do it.  That taught me the importance checking back-ups.  
George's early June comments (in a different thread) about USB sticks taught me 
the importance of back-up checks being deep, at least occasionally.

I've tagged this thread SOLVED.  "rsync --dry-run -c" seems to be a good solution in many 
cases, but "diff -r" is better when a truly deep check is preferred.  I thank everyone 
for their contributions.
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to