On Friday 6 September 2024 01:43:18 BST Dale wrote: > Michael wrote: > > On Thursday 5 September 2024 19:55:56 BST Frank Steinmetzger wrote: > >> Am Thu, Sep 05, 2024 at 06:30:54AM -0500 schrieb Dale: > >>>> Use rsync with: > >>>> --checksum > >>>> > >>>> and > >>>> > >>>> --dry-run > >> > >> I suggest calculating a checksum file from your active files. Then you > >> don’t have to read the files over and over for each backup iteration you > >> compare it against. > >> > >>>> You can also run find to identify which files were changed during the > >>>> period you were running with the dodgy RAM. Thankfully you didn't run > >>>> for too long before you spotted it. > >> > >> This. No need to check everything you ever stored. Just the most recent > >> stuff, or at maximum, since you got the new PC. > >> > >>> I have just shy of 45,000 files in 780 directories or so. Almost 6,000 > >>> in another. Some files are small, some are several GBs or so. Thing > >>> is, backups go from a single parent directory if you will. Plus, I'd > >>> want to compare them all anyway. Just to be sure. > >> > >> I aqcuired the habit of writing checksum files in all my media > >> directories > >> such as music albums, tv series and such, whenever I create one such > >> directory. That way even years later I can still check whether the files > >> are intact. I actually experienced broken music files from time to time > >> (mostly on the MicroSD card in my tablet). So with checksum files, I can > >> verify which file is bad and which (on another machine) is still good. > > > > There is also dm-verity for a more involved solution. I think for Dale > > something like this should work: > > > > find path-to-directory/ -type f | xargs md5sum > digest.log > > > > then to compare with a backup of the same directory you could run: > > > > md5sum -c digest.log | grep FAILED > > > > Someone more knowledgeable should be able to knock out some clever python > > script to do the same at speed. > > I'll be honest here, on two points. I'd really like to be able to do > this but I have no idea where to or how to even start. My setup for > series type videos. In a parent directory, where I'd like a tool to > start, is about 600 directories. On a few occasions, there is another > directory inside that one. That directory under the parent is the name > of the series. Sometimes I have a sub directory that has temp files; > new files I have yet to rename, considering replacing in the main series > directory etc. I wouldn't mind having a file with a checksum for each > video in the top directory, and even one in the sub directory. As a > example. > > TV_Series/ > > ├── 77 Sunset Strip (1958) > │ └── torrent > ├── Adam-12 (1968) > ├── Airwolf (1984) > > > I got a part of the output of tree. The directory 'torrent' under 77 > Sunset is temporary usually but sometimes a directory is there for > videos about the making of a video, history of it or something. What > I'd like, a program that would generate checksums for each file under > say 77 Sunset and it could skip or include the directory under it. > Might be best if I could switch it on or off. Obviously, I may not want > to do this for my whole system. I'd like to be able to target > directories. I have another large directory, lets say not a series but > sometimes has remakes, that I'd also like to do. It is kinda set up > like the above, parent directory with a directory underneath and on > occasion one more under that.
As an example, let's assume you have the following fs tree: VIDEO ├──TV_Series/ | ├── 77 Sunset Strip (1958) | │ └── torrent | ├── Adam-12 (1968) | ├── Airwolf (1984) | ├──Documentaries ├──Films ├──etc. You could run: $ find VIDEO -type f | xargs md5sum > digest.log The file digest.log will contain md5sum hashes of each of your files within the VIDEO directory and its subdirectories. To check if any of these files have changed, become corrupted, etc. you can run: $ md5sum -c digest.log | grep FAILED If you want to compare the contents of the same VIDEO directory on a back up, you can copy the same digest file with its hashes over to the backup top directory and run again: $ md5sum -c digest.log | grep FAILED Any files listed with "FAILED" next to them have changed since the backup was originally created. Any files with "FAILED open or read" have been deleted, or are inaccessible. You don't have to use md5sum, you can use sha1sum, sha256sum, etc. but md5sum will be quicker. The probability of ending up with a hash clash across two files must be very small. You can save the digest file with a date, PC name, top directory name next to it, to make it easy to identify when it was created and its origin. Especially useful if you move it across systems. > One thing I worry about is not just memory problems, drive failure but > also just some random error or even bit rot. Some of these files are > rarely changed or even touched. I'd like a way to detect problems and > there may even be a software tool that does this with some setup, > reminds me of Kbackup where you can select what to backup or leave out > on a directory or even individual file level. > > While this could likely be done with a script of some kind, my scripting > skills are minimum at best, I suspect there is software out there > somewhere that can do this. I have no idea what or where it could be > tho. Given my lack of scripting skills, I'd be afraid I'd do something > bad and it delete files or something. O_O LOL The above two lines is just one way, albeit rather manual way, to achieve this. Someone with coding skills should be able to write up a script to more or less automate this, if you can't find something ready-made in the interwebs. > I been watching videos again, those I was watching during the time the > memory was bad. I've replaced three so far. I think I noticed this > within a few hours. Then it took a little while for me to figure out > the problem and shutdown to run the memtest. I doubt many files were > affected unless it does something we don't know about. I do plan to try > to use rsync checksum and dryrun when I get back up and running. Also, > QB is finding a lot of its files are fine as well. It's still > rechecking them. It's a lot of files. > > Right now, I suspect my backup copy is likely better than my main copy. > Once I get the memory in and can really run some software, then I'll run > rsync with those compare options and see what it says. I just got to > remember to reverse things. Backup is the source not the destination. > If this works, I may run that each time, help detect problems maybe. > Maybe?? This should work in rsync terms: rsync -v --checksum --delete --recursive --dry-run SOURCE/ DESTINATION It will output a list of files which have been deleted from the SOURCE and will need to be deleted at the DESTINATION directory. It will also provide a list of changed files at SOURCE which will be copied over to the destination. When you use --checksum the rsync command will take longer than when you don't, because it will be calculating a hash for each source and destination file to determine it if has changed, rather than relying on size and timestamp.
signature.asc
Description: This is a digitally signed message part.