On 3/20/2011 6:52 PM, Wouter Verhelst wrote: > Hi, > > At a customer, we'd been running bacula since quite some time. This is > now running on a Debian 'lenny' that originally was an etch installation > (with 1.38.11) and has since been upgraded. We will probably upgrade > once more in some time to squeeze (with 5.0.2), but no concrete plans > exist for this. It's running against a PostgreSQL 8.3 database (also the > standard version in debian lenny). > > Originally, bacula ran pretty smoothly. But in recent times, mainly due > to the volumes having gone through the roof, things don't run as > smoothly anymore. > > I understand that 2.4.4 is probably not under development anymore, and > that it's likely that none of this is going to be fixed for this branch. > But if these issues have been fixed long ago, I'd appreciate if people > could tell me, so I know. > > With the original installation, the amount of data that was added and > then removed again on a weekly basis (we have weekly full backups) was > quite detrimental to postgresql's autovacuum feature, to the extent that > it wouldn't work anymore. That is, the amount of data that had been > removed from the table would be so large that the amount of disk space > to be released would be over a particular percentage, which triggered a > sanity check in the autovacuum daemon, causing it to stop doing the > autovacuum. As a result, the database files would balloon in size, > eventually taking up 70G of data (when a dump of the database was just a > few hundred megs). I fixed this by adding an explicit 'vacuumdb -f > bacula' to the 'delete_catalog_backup' script. > > I had however failed to disable autovacuuming, and with the backup > now requiring 3 LTO3 tapes and over 48 hours, eventually the autovacuum > daemon started interfering; when it kicks in, it causes a database-level > lock, which would sometimes cause the backup to fail in the following > manner: > > 06-feb 10:09 belessnas-dir JobId 4241: Fatal error: sql.c:249 sql.c:249 query > SELECT count(*) from JobMedia WHERE JobId=4241 failed: > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > (with this in the postgres log at around the same time:) > > 2011-02-06 10:09:19 CET LOG: autovacuum launcher started > 2011-02-06 10:09:19 CET LOG: database system is ready to accept connections > > I guess what I'm saying with all this is that it might be nice if bacula > were to play a bit more nicely with postgresql's vacuuming process, > which is fairly essential for it to function nicely. > > That was last february; backups have since been running, sometimes > okayish, sometimes not (there's also the matter of the tape robot > sometimes having issues, but this is hardly bacula's fault). > > Today, then, bacula failed with the following message: > > 20-mrt 22:02 belessnas-dir JobId 4365: Fatal error: Can't fill File table > Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, > MD5)SELECT batch.FileIndex, batch.JobId, Path.PathId, > Filename.FilenameId,batch.LStat, batch.MD5 FROM batch JOIN Path ON > (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): > ERR=ERROR: integer out of range > > This was accurate: > > bacula=# SELECT last_value from file_fileid_seq; > last_value > ------------ > 2147483652 > (1 row) > > Yes, we've been running it for several years now, and apparently we've > written over 2 billion files to tape. I've ran an 'ALTER TABLE File ALTER > fileid TYPE bigint' to change the fileid field into a 64 bit, rather > than a 32 bit, variable, which should fix this for the forseeable > future; however, I have a few questions: > - Is it okay for me to change the data type of the 'fileid' column like > that? Note that I've also changed it in other tables which have a > 'fileid' column. If bacula doesn't internally use the fileid number in > a 32 bit integer, then that shouldn't be a huge problem, but I don't > know whether it does.
Yes, I think you're fine. > - Since things haven't really been running smoothly here, every time the > backup fails, the customer gets less happy with bacula. Are there any > other people here who run bacula to write fairly large volumes of data > to tape, and can they give me some pointers on things to avoid? That > way, I could hopefully avoid common pitfalls before I run into them. > Obviously if there is some documentation on this somewhere that I > missed, a simple pointer would be nice. > - Finally, I realize that many of these issues may be fixed in a more > recent version of bacula, but I have no way to be sure -- this > particular customer is the only place were I have bacula running with > such large data volumes, and obviously just upgrading a particularly > important server without coordination and only a vague idea that it > *might* improve things isn't very well an option. However, if someone > could authoritatively tell me that these issues have been fixed in a > more recent version, then an upgrade would probably be a very good > idea... Others will report in on your other questions. -- Dan Langille - http://langille.org/ ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users