Kern Sibbald wrote: > On Wednesday 22 November 2006 17:54, Alan Brown wrote: > >> The only package I'm aware of which doesn't rely on OS timestamps uses a >> database to keep snapshots of the filesystem state (and is quite >> expensive) >> >> >> Bacula has an extensive database, why not USE it? >> > > This is no problem. It has been planned. This database is already used to > detect changed files for Verify. It works very well. > > >> ================== >> >> >> The fundamental problem with almost all backup software is an assumption >> that file timestamps will only ever increase, never decrease. While this >> is right most of the time, it's not right all the time. >> >> >> >> The current problem is that Bacula (in common with almost all other backup >> software) only looks for mtime or ctime higher than the last backup start >> time. >> >> If for whatever reason a file appears with Mtime and ctime PRIOR to the >> last backup start time (eg rsynced in, or last backup was of an unattached >> filesystem stub...), it won't get looked at. >> >> These can be taken care of by comparing the current filesystem >> state with the last known filesystem state in the database and >> backing up files which have appeared regardless of timertamp. >> >> Once you start doing this, it is possible to note if a file has gone >> missing and flag it as deleted in the database - that takes care of >> restores bringing back zombie files, effectively giving "snapshot" >> restoration without having to replicate the entire database. >> >> >> Another worse-case scenario: >> >> An attacker replaces a system-critical file with another of the same size >> and tweaks ctime+mtime. This won't be backed up, even if the file is on a >> different inode (some versions of file replacement may not change the >> inode being used, some filesystems (reiser) do not store inode >> information. This means there is no way of telling when the file was >> changed and renders all backups suspect. >> >> >> The only way to deal with this is checksumming the file vs >> stored file checksums and that is computationally expensive. >> >> > I would really like to see this implemented. All the mechanisms already > exist, it is just a matter of implementing them. > > Best regards, > > Kern
Kern Sibbald wrote: > On Wednesday 22 November 2006 17:54, Alan Brown wrote: > >> The only package I'm aware of which doesn't rely on OS timestamps uses a >> database to keep snapshots of the filesystem state (and is quite >> expensive) >> >> >> Bacula has an extensive database, why not USE it? >> > > This is no problem. It has been planned. This database is already used to > detect changed files for Verify. It works very well. > > >> ================== >> >> >> The fundamental problem with almost all backup software is an assumption >> that file timestamps will only ever increase, never decrease. While this >> is right most of the time, it's not right all the time. >> >> >> >> The current problem is that Bacula (in common with almost all other backup >> software) only looks for mtime or ctime higher than the last backup start >> time. >> >> If for whatever reason a file appears with Mtime and ctime PRIOR to the >> last backup start time (eg rsynced in, or last backup was of an unattached >> filesystem stub...), it won't get looked at. >> >> These can be taken care of by comparing the current filesystem >> state with the last known filesystem state in the database and >> backing up files which have appeared regardless of timertamp. >> >> Once you start doing this, it is possible to note if a file has gone >> missing and flag it as deleted in the database - that takes care of >> restores bringing back zombie files, effectively giving "snapshot" >> restoration without having to replicate the entire database. >> >> >> Another worse-case scenario: >> >> An attacker replaces a system-critical file with another of the same size >> and tweaks ctime+mtime. This won't be backed up, even if the file is on a >> different inode (some versions of file replacement may not change the >> inode being used, some filesystems (reiser) do not store inode >> information. This means there is no way of telling when the file was >> changed and renders all backups suspect. >> >> >> The only way to deal with this is checksumming the file vs >> stored file checksums and that is computationally expensive. >> >> > I would really like to see this implemented. All the mechanisms already > exist, it is just a matter of implementing them. > > Best regards, > > Kern I've been following the list for a while, but haven't commented on anything yet. Retrospect had an awesome reputation in the Mac world for over a decade. One of the things that made them really good was their ability to recognize duplicate files (even on different machines within the same backup set), added files, removed files, etc. Backing up a bunch of machines over the network started a little slow on the first and then picked up momentum and compactness as the backup proceeded. For example, the first Mac OS 7.1 machine would back up everything. The second would end up saying "I already have that file" many times, would back up faster because it would skip the file copy over the network, and would end up much smaller on the tape, because it would have a bunch of links and only the full files unique to that computer. Their reputation didn't fare as well as they expanded into Windows and Linux, had to deal with Mac OS X, and then got bought up by EMC. They may still be a worthwhile example to look at. When I talked to them in the 90's, they said they were using a bunch of information to identify what needed to be backed up. I don't know if they were using checksums or what. The client/server interaction becomes more complex. Instead of the client deciding what needed to be backed up based on a backup level passed to it from the server, the client would send information about all the files to the server and the server would have to respond back with what to backup based on comparing that against the database. --------------- Chris Hoogendyk - O__ ---- Systems Administrator c/ /'_ --- Biology & Geology Departments (*) \(*) -- 140 Morrill Science Center ~~~~~~~~~~ - University of Massachusetts, Amherst <[EMAIL PROTECTED]> --------------- Erdös 4 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users