On Mon, Aug 17, 2009 at 6:26 PM, Andrew Sackville-West<and...@farwestbilliards.com> wrote: > Here's another question: what is stored in all these millions of > files? > [...]
Basically, the lions share would be tonnes of user-generated files, for example huge numbers of image files (and thumbnails) that get stored in directory structures on one of the file servers. Other examples would be extensive music & sound libraries, several debian/ubuntu/etc mirrors, and so on. About tarring before backing up. Yeah, that's possible too (for some types of data/directory layouts). But then something needs to (on the file server side), check if the tars are still up to date. And also those tars will take up a lot of precious harddrive space on the file server :-(. Unless you mean remove the original data.. which is problematic in a few ways. And of course, storing different versions of those tars (eg: users move files around at the source) is also problematic. Basically... as you say it would be like tail wagging the dog. Things would get a lot more complicated & fragile, and in exchange I get a lot of other, more serious backup problems, which are harder to work around than the current issues. About moving to database. Well the filesystem is already a database :-). And then trying to keep backups of that (multi-TB) database itself is a major problem. Not to mention, users and software now have to go through some other software to get to their files... don't want to go there.. my head hurts ^^; The file servers themselves do have a large number of files... that isn't really the problem. The problem is actually in the backup software which causes issues trying to handle history for those backups (either using massive amounts of memory/cpu, or creating massive numbers of hardlinks, and so on). Basically, rdiff-backup was perfect for a while. But then we upgraded the server to Lenny. And then it stopped working T_T. I think that rdiff-backup's author must have changed something, which now causes huge ram usage for large file lists, or other per-file data of some kind. imo that's unnecessary (it could just use something like a set of Python iterators in a clever way, or work with incremental file lists like rsync), but I didn't get any useful replies on their mailing list when I mentioned my problem and gave a few ideas. So for now, a combination of ugly hacks with hardlink-type pruning for history snapshots, and blindly deleting older backup generations to get space back when needed. At least until I find a better solution. Anyway, thanks for your ideas :-) David. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org