Am Fri, Sep 06, 2024 at 01:21:20PM +0100 schrieb Michael: > > > find path-to-directory/ -type f | xargs md5sum > digest.log > > > > > > then to compare with a backup of the same directory you could run: > > > > > > md5sum -c digest.log | grep FAILED
I had a quick look at the manpage: with md5sum --quiet you can omit the grep part. > > > Someone more knowledgeable should be able to knock out some clever python > > > script to do the same at speed. And that is exactly what I have written for myself over the last 11 years. I call it dh (short for dirhash). As I described in the previous mail, I use it to create one hash files per directory. But it also supports one hash file per data file and – a rather new feature – one hash file at the root of a tree. Have a look here: https://github.com/felf/dh Clone the repo or simply download the one file and put it into your path. > > I'll be honest here, on two points. I'd really like to be able to do > > this but I have no idea where to or how to even start. My setup for > > series type videos. In a parent directory, where I'd like a tool to > > start, is about 600 directories. On a few occasions, there is another > > directory inside that one. That directory under the parent is the name > > of the series. In its default, my tool ignores directories which have subdirectories. It only hashes files in dirs that have no subdirs (leaves in the tree). But this can be overridden with the -f option. My tool also has an option to skip a number of directories and to process only a certain number of directories. > > Sometimes I have a sub directory that has temp files; > > new files I have yet to rename, considering replacing in the main series > > directory etc. I wouldn't mind having a file with a checksum for each > > video in the top directory, and even one in the sub directory. As a > > example. > > > > TV_Series/ > > > > ├── 77 Sunset Strip (1958) > > │ └── torrent > > ├── Adam-12 (1968) > > ├── Airwolf (1984) So with my tool you would do $ dh -f -F all TV_Series `-F all` causes a checksum file to be created for each data file. > > What > > I'd like, a program that would generate checksums for each file under > > say 77 Sunset and it could skip or include the directory under it. Unfortunately I don’t have a skip feature yet that skips specific directories. I could add a feature that looks for a marker file and then skips that directory (and its subdirs). > > Might be best if I could switch it on or off. Obviously, I may not want > > to do this for my whole system. I'd like to be able to target > > directories. I have another large directory, lets say not a series but > > sometimes has remakes, that I'd also like to do. It is kinda set up > > like the above, parent directory with a directory underneath and on > > occasion one more under that. > > As an example, let's assume you have the following fs tree: > > VIDEO > ├──TV_Series/ > | ├── 77 Sunset Strip (1958) > | │ └── torrent > | ├── Adam-12 (1968) > | ├── Airwolf (1984) > | > ├──Documentaries > ├──Films > ├──etc. > > You could run: > > $ find VIDEO -type f | xargs md5sum > digest.log > > The file digest.log will contain md5sum hashes of each of your files within > the VIDEO directory and its subdirectories. > > To check if any of these files have changed, become corrupted, etc. you can > run: > > $ md5sum -c digest.log | grep FAILED > > If you want to compare the contents of the same VIDEO directory on a back up, > you can copy the same digest file with its hashes over to the backup top > directory and run again: > > $ md5sum -c digest.log | grep FAILED My tool does this as well. ;-) In check mode, it recurses, looks for hash files and if it finds them, checks all hashes. There is also an option to only check paths and filenames, not hashes. This allows to quickly find files that have been renamed or deleted since the hash file was created. > > One thing I worry about is not just memory problems, drive failure but > > also just some random error or even bit rot. Some of these files are > > rarely changed or even touched. I'd like a way to detect problems and > > there may even be a software tool that does this with some setup, > > reminds me of Kbackup where you can select what to backup or leave out > > on a directory or even individual file level. Well that could be covered with ZFS, especially with a redundant pool so it can repair itself. Otherwise it will only identify the bitrot, but not be able to fix it. > > Right now, I suspect my backup copy is likely better than my main copy. The problem is: if they differ, how do you know which one is good apart from watching one from start to finish? You could use vbindiff to first find the part that changed. That will at least tell you where the difference is, so you could seek to the area of the position in the video. > This should work in rsync terms: > > rsync -v --checksum --delete --recursive --dry-run SOURCE/ DESTINATION > > It will output a list of files which have been deleted from the SOURCE and > will need to be deleted at the DESTINATION directory. If you look at changed *and* deleted files in one run, better use -i instead of -v. -- Grüße | Greetings | Salut | Qapla’ Please do not share anything from, with or about me on any social network. If two processes are running concurrently, the less important will take processor time away from the more important one.
signature.asc
Description: PGP signature