On 2025-08-27 12:07, Rob Gerber wrote:
Gary,
Sorry about the delay in reply. I have a few moments now.
Do the files you want to back up exist on different hosts, or only on
the one server? It sounds like they're only on the one server, but
please let me know which it is.
All the files exist on the one server. However, I will need to log in
locally as root to be umount /home. ssh is set to disallow connecting as
root.
Phil said:
But if I were you, if this rsync backup set contains *important* files
that no longer exist "in production", so to speak, I would sort out
those critical files into a distinct designated archive of their own and
add *that archive* to your backup set.
And I agree. However, that sorting process need not be manual.
Consider using dupe guru. It is open source software that compares two
datasets (even if folder structure isn't symmetrical) and finds
duplicate files. You will need a GUI - it cannot run in CLI alone.
The general process will be as follows:
1. Double check the setting since dupeguru to ensure it will be
hashing every file you intend to compare.
2. On the main tab for dupeguru, click to compare by contents. In the
bottom pane, add two paths: the path to the root folder for your live
files and the path to the root folder for your rsync backups. For the
live file path, on the right say that it is a "reference" dataset.
Dupeguru wont take any action against a reference dataset.
3. Double check, everything, then click compare. Wait for the scans to
finish. No action will be taken until you make a decision in the
software. Expect this to take some time.
4. Look at the comparison results in the right tab. It should show you
filenames and full paths for original and duplicate files. Examine the
results for sanity, and then take action. I recommend hiding all
reference files, re-checking to ensure that it now only shows
duplicates, and they are all in your rsync backup folder. In dupeguru,
select the option to move the detected duplicate files to another
location on your system that is not among the live files or the rsync
files.
5. In your file management tools (dolphin, Nautilus, mc, ls,
whatever), examine the rsync backup folder. Is it now much smaller
with many fewer files? These are the deltas, the files that were
different from those seen in the live files.
6. Presumably a lot of duplicate files have been moved somewhere else.
Also, hopefully the space needs of the rsync files have been greatly
reduced. At this point, if you're satisfied with the results, you can
delete the duplicate files.
Be careful. Double and triple check everything. Read the manual. Here
be dragons. Etc.
Sounds like more trouble than it's worth as it would still mean that I
would have some files that I wouldn't be able to backup with their
original paths.
I've started the process of installing and configuring Bacula on the
server but it's not going well. However that's a separate issue.
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users