2012/2/13 Joe Nyland <joenyl...@me.com>: > Hello everyone, > > I hope someone would be able to offer any suggestions of why I am seeing the > following behaviour in my current Bacula setup: > > Since the tail end of last week, I have been having issues with my MySQL > backups in Bacula, where they would randomly appear to 'crash', normally > when performing a copy of a backup to another pool - but I'm not sure yet if > this is the trigger. > > Running 'status dir' after one of these 'crashes' gives the following output > for the running jobs: > > Running Jobs: > Console connected at 12-Feb-12 15:53 > Console connected at 13-Feb-12 06:58 > JobId Level Name Status > ====================================================================== > 2107 Full WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running > <Crashed Job> > 2108 Full WebServer1_MySQL.2012-02-13_04.30.00_29 is running <Crashed > Job> > 2111 Full MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for > higher priority jobs to finish > 2113 Full TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution > 2114 Full MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting > execution > 2115 Full WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting > execution > 2116 Full WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error > 2117 Full TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting > execution > 2121 Full MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting > execution > 2122 Full WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting > execution > 2123 Full WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error > 2124 Full TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting > execution > 2125 Full MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error > 2126 Full WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error > ==== > > Once the above appears, I am unable to view the status of any storage > resource on my SD: > > *status storage=FileServer1_Full > Connecting to Storage daemon FileServer1_Full at FileServer1:9103 > > FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu > 10.04 > Daemon started 12-Feb-12 15:53, 92 Jobs run since started. > Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577 > max_bufs=994 > Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 > > Running Jobs: > Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107 > Volume="WebServer1_MySQL_1325" > pool="WebServer1_MySQL" device="WebServer1_MySQL" > (/mnt/backup/Bacula/Databases/WebServer1) > Files=4 Bytes=164,924 Bytes/sec=17 > FDSocket closed > ==== > > Jobs waiting to reserve a drive: > ==== > > Terminated Jobs: > JobId Level Files Bytes Status Finished Name > =================================================================== > 2091 Full 2 92.45 K OK 13-Feb-12 03:30 > TestServer_MySQL_Copy > 2096 Full 5 2.258 M OK 13-Feb-12 03:30 > MythTVServer1_MySQL_Copy > 2098 Full 4 164.9 K OK 13-Feb-12 03:30 > WebServer1_MySQL_Copy > 2100 Full 2 92.45 K OK 13-Feb-12 03:30 > TestServer_MySQL_Copy > 2078 Full 1,145 2.942 G OK 13-Feb-12 03:31 SVN_Copy > 2102 Full 5 2.259 M OK 13-Feb-12 04:01 > MythTVServer1_MySQL > 2103 Full 4 164.9 K OK 13-Feb-12 04:01 > WebServer1_MySQL > 2104 Full 2 92.37 K OK 13-Feb-12 04:01 > TestServer_MySQL > 2105 Full 5 2.259 M OK 13-Feb-12 04:30 > MythTVServer1_MySQL_Copy > 2109 Full 2 92.37 K OK 13-Feb-12 04:30 > TestServer_MySQL_Copy > ==== > > Device status: > Device "Default" (/mnt/backup/Bacula) is not open. > <snip> > Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not > open. > Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is > mounted with: > Volume: WebServer1_MySQL_1325 > Pool: WebServer1_MySQL > Media type: File > Total Bytes Read=0 Blocks Read=0 Bytes/block=0 > Positioned at File=0 Block=0 > Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1) > is not open. > Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is > not open. > Device "WebServer1_Inc_Copy" > (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open. > <snip> > Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not > open. > ==== > > Used Volume status: > > NOTE: bconsole appears to crash here - no further output is produced, and > bconsole does not respond to any key presses. I have to Ctrl + C to exit out > from bconsole. Furthermore, the only way I can clear our the failed jobs > from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service > bacula-sd stop' twice, then restart the SD and restart bacula-director. > > > What I have is for 4 of my clients I run a MySQL backup hourly at 00:00, > 01:00, etc. I then copy the MySQL backups to another storage resource on my > SD at 00:30, 01:30, etc. The MySQL databases which I am backing up are > relatively small, the biggest of which is my Bacula catalog - ~160Mb - > although this backup is currently disabled and the database backed up > outside of Bacula until I can resolve this issue. > > Here's the config for one of the client's MySQL backups: > > JobDefs { > Name = DefaultBackup > Type = Backup > Accurate = yes > Level = Full > Client = FileServer1-fd > Messages = Standard > Pool = Default > Storage = Default > Priority = 10 > Allow Duplicate Jobs = No > Cancel Lower Level Duplicates = yes > } > > JobDefs { > Name = DefaultCopy > Type = Copy > Level = Full > Client = FileServer1-fd > Messages = Standard > Selection Type = PoolUncopiedJobs > Priority = 12 > } > > Job { > Name = TestServer_MySQL > Type = Backup > JobDefs = DefaultBackup > Client = TestServer-fd > FileSet = "MySQL Databases" > ClientRunBeforeJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh > bacula_backup Gromit123" > ClientRunAfterJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh > cleanup" > Schedule = "Hourly MySQL Database Schedule" > Messages = Standard > Pool = TestServer_MySQL > Storage = TestServer_MySQL > Enabled = No > } > > Job { > Name = "TestServer_MySQL_Copy" > JobDefs = DefaultCopy > Type = Copy > Client = TestServer-fd > FileSet = "MySQL Databases" > Pool = TestServer_MySQL > Messages = Standard > Schedule = "Hourly MySQL Database Copy Schedule" > Storage = TestServer_MySQL > Enabled = No > } > > Reading back through console messages leading up to the crash, there doesn't > appear to be any suggestion for why the jobs have crashed, only messages > about duplicate jobs not being allowed for the jobs which are queued after > the crashed jobs at the top of the queue. > > > If I can provide any further information to help diagnose this issue, please > let me know and I will be able to provide it. >
I would look at the log for the sd. One way to get this is to run bacula-sd in a console with the debug -d 100 option enabled instead of running it as a daemon. You can also google for bacula kaboom for more debugging tips. John ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users