Hi, At a customer, we'd been running bacula since quite some time. This is now running on a Debian 'lenny' that originally was an etch installation (with 1.38.11) and has since been upgraded. We will probably upgrade once more in some time to squeeze (with 5.0.2), but no concrete plans exist for this. It's running against a PostgreSQL 8.3 database (also the standard version in debian lenny).
Originally, bacula ran pretty smoothly. But in recent times, mainly due to the volumes having gone through the roof, things don't run as smoothly anymore. I understand that 2.4.4 is probably not under development anymore, and that it's likely that none of this is going to be fixed for this branch. But if these issues have been fixed long ago, I'd appreciate if people could tell me, so I know. With the original installation, the amount of data that was added and then removed again on a weekly basis (we have weekly full backups) was quite detrimental to postgresql's autovacuum feature, to the extent that it wouldn't work anymore. That is, the amount of data that had been removed from the table would be so large that the amount of disk space to be released would be over a particular percentage, which triggered a sanity check in the autovacuum daemon, causing it to stop doing the autovacuum. As a result, the database files would balloon in size, eventually taking up 70G of data (when a dump of the database was just a few hundred megs). I fixed this by adding an explicit 'vacuumdb -f bacula' to the 'delete_catalog_backup' script. I had however failed to disable autovacuuming, and with the backup now requiring 3 LTO3 tapes and over 48 hours, eventually the autovacuum daemon started interfering; when it kicks in, it causes a database-level lock, which would sometimes cause the backup to fail in the following manner: 06-feb 10:09 belessnas-dir JobId 4241: Fatal error: sql.c:249 sql.c:249 query SELECT count(*) from JobMedia WHERE JobId=4241 failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. (with this in the postgres log at around the same time:) 2011-02-06 10:09:19 CET LOG: autovacuum launcher started 2011-02-06 10:09:19 CET LOG: database system is ready to accept connections I guess what I'm saying with all this is that it might be nice if bacula were to play a bit more nicely with postgresql's vacuuming process, which is fairly essential for it to function nicely. That was last february; backups have since been running, sometimes okayish, sometimes not (there's also the matter of the tape robot sometimes having issues, but this is hardly bacula's fault). Today, then, bacula failed with the following message: 20-mrt 22:02 belessnas-dir JobId 4365: Fatal error: Can't fill File table Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5)SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5 FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=ERROR: integer out of range This was accurate: bacula=# SELECT last_value from file_fileid_seq; last_value ------------ 2147483652 (1 row) Yes, we've been running it for several years now, and apparently we've written over 2 billion files to tape. I've ran an 'ALTER TABLE File ALTER fileid TYPE bigint' to change the fileid field into a 64 bit, rather than a 32 bit, variable, which should fix this for the forseeable future; however, I have a few questions: - Is it okay for me to change the data type of the 'fileid' column like that? Note that I've also changed it in other tables which have a 'fileid' column. If bacula doesn't internally use the fileid number in a 32 bit integer, then that shouldn't be a huge problem, but I don't know whether it does. - Since things haven't really been running smoothly here, every time the backup fails, the customer gets less happy with bacula. Are there any other people here who run bacula to write fairly large volumes of data to tape, and can they give me some pointers on things to avoid? That way, I could hopefully avoid common pitfalls before I run into them. Obviously if there is some documentation on this somewhere that I missed, a simple pointer would be nice. - Finally, I realize that many of these issues may be fixed in a more recent version of bacula, but I have no way to be sure -- this particular customer is the only place were I have bacula running with such large data volumes, and obviously just upgrading a particularly important server without coordination and only a vague idea that it *might* improve things isn't very well an option. However, if someone could authoritatively tell me that these issues have been fixed in a more recent version, then an upgrade would probably be a very good idea... Thanks, -- Wouter Verhelst NixSys BVBA Louizastraat 14, 2800 Mechelen T: +32 15 27 69 50 / F: +32 15 27 69 51 / M: +32 486 836 198 ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users