Hello,

A few very large files don't appear to have changed since the last full
backup, yet are being backed up repeatedly by incremental / differential
backups. Why?

Details:
I am using bacula to backup a fileserver for one of my customers. For the
primary share, I am using a full/diff/inc backup cycle. I have a few very
large files that won't change (system images for user PCs). I didn't want
to use up my storage backing up those system images multiple times, so I
set up a simpler backup cycle that would take a full once, then a diff and
inc periodically thereafter. The expectation was that the files wouldn't
change and we'd rarely add files, so most backups would be 0 files / 0
bytes. The user's PC files are routinely backed up by a separate service.

The problem is that the system images have been backed up 3 times during
the last 6 months. I picked one of the system image files, and ran an SQL
query for it against the database. The results indicated that the target
file was backed up 3 times. The file hashes were the same each time, but
the lstat column had some differences. Unfortunately, the metadata is
encoded somehow and I dont understand what changed. I ran sha512sum against
the file in question, but its output was different than that found in the
bacula database. I used python to base64 decode the catalog hash, and when
the output is displayed in hex it matches the hash provided by sha512sum.
So the file appears to be unchanged from its first time being backed up,
yet has been backed up 3 times.

[root@stallone macrium]# sha512sum A6A87E2D36FAE06E-Mike\ W11-00-00.mrimg
dc396719178be8a34c028f32475d84d53c42cd5c4de63bed15154154eed258ebb010bf0a366f525c4d5cc04d2d41653806f57689a1f1fdb42ce8b27500e4037f
 A6A87E2D36FAE06E-Mike W11-00-00.mrimg

The file was backed up 3 times.
Jobid 70: full backup.
Jobid 642: incremental backup
Jobid 667: differential backup.

It makes sense that a differential backup after an incremental backup would
contain the contents of the incremental. However, the bacula database has
the same hash for each of the 3 backups. Doesn't that imply that the file
hasn't changed? I don't know why it was backed up again in the incremental
backup. Please note that there were many incremental or differential
backups ran daily that didn't see any changes or needs to back up any files
in this dataset - except for these few times.

My query is in the pastebin link (I don't know if the length of the query
will mess up someone's email client).
https://pastebin.com/6PCbZ21s

Here is the metadata section from the query:
|                                 lstat             |
+-----------------------------------------------------------------------
| P0j TaAhK IHw B Pu Pw A MQ2A6aB BAA BiGwPA BmB6ge BmB6gdBmB6ge A A C |
| P0j TaAhK IHw B Pu Pw A MQ2A6aB BAA BiGwPA BmNcn/ BmB6gdBnQLI1 A A C |
| P0j TaAhK IHw B Pu Pw A MQ2A6aB BAA BiGwPA BnQXpa BmB6gdBnQLI1 A A C |

Here is the output of 'stat'
[root@stallone macrium]# stat A6A87E2D36FAE06E-Mike\ W11-00-00.mrimg
  File: A6A87E2D36FAE06E-Mike W11-00-00.mrimg
  Size: 842719798913    Blocks: 1645937600 IO Block: 4096   regular file
Device: fd23h/64803d    Inode: 325584970   Links: 1
Access: (0760/-rwxrw----)  Uid: ( 1006/    mike)   Gid: ( 1008/hermanstaff)
Context: system_u:object_r:samba_share_t:s0
Access: 2024-12-12 08:44:43.737875113 -0600
Modify: 2024-03-30 00:50:21.816717186 -0500
Change: 2024-11-22 10:32:53.237897204 -0600
 Birth: 2024-03-29 15:51:12.128951779 -0500

Here is the relevant fileset:
FileSet {
  Name = "Stallone_macrium_backups"
  Include {
    Options {
      signature = SHA512
 Verify = s3
    }
File = /mnt/data/shares/cncshare/Craeon/Backup/macrium
  }

  Exclude {
  }
}

If there is something else I can provide, please let me know.

Regards,
Robert Gerber
402-237-8692
r...@craeon.net
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to