Hello everyone,
I hope someone would be able to offer any suggestions of why I am seeing the
following behaviour in my current Bacula setup:
Since the tail end of last week, I have been having issues with my MySQL
backups in Bacula, where they would randomly appear to 'crash', normally when
performing a copy of a backup to another pool - but I'm not sure yet if this is
the trigger.
Running 'status dir' after one of these 'crashes' gives the following output
for the running jobs:
Running Jobs:
Console connected at 12-Feb-12 15:53
Console connected at 13-Feb-12 06:58
JobId Level Name Status
======================================================================
2107 Full WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running
<Crashed Job>
2108 Full WebServer1_MySQL.2012-02-13_04.30.00_29 is running
<Crashed Job>
2111 Full MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for higher
priority jobs to finish
2113 Full TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution
2114 Full MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting
execution
2115 Full WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting execution
2116 Full WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error
2117 Full TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting execution
2121 Full MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting
execution
2122 Full WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting execution
2123 Full WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error
2124 Full TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting execution
2125 Full MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error
2126 Full WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error
====
Once the above appears, I am unable to view the status of any storage resource
on my SD:
*status storage=FileServer1_Full
Connecting to Storage daemon FileServer1_Full at FileServer1:9103
FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu
10.04
Daemon started 12-Feb-12 15:53, 92 Jobs run since started.
Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577
max_bufs=994
Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8
Running Jobs:
Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107
Volume="WebServer1_MySQL_1325"
pool="WebServer1_MySQL" device="WebServer1_MySQL"
(/mnt/backup/Bacula/Databases/WebServer1)
Files=4 Bytes=164,924 Bytes/sec=17
FDSocket closed
====
Jobs waiting to reserve a drive:
====
Terminated Jobs:
JobId Level Files Bytes Status Finished Name
===================================================================
2091 Full 2 92.45 K OK 13-Feb-12 03:30
TestServer_MySQL_Copy
2096 Full 5 2.258 M OK 13-Feb-12 03:30
MythTVServer1_MySQL_Copy
2098 Full 4 164.9 K OK 13-Feb-12 03:30
WebServer1_MySQL_Copy
2100 Full 2 92.45 K OK 13-Feb-12 03:30
TestServer_MySQL_Copy
2078 Full 1,145 2.942 G OK 13-Feb-12 03:31 SVN_Copy
2102 Full 5 2.259 M OK 13-Feb-12 04:01 MythTVServer1_MySQL
2103 Full 4 164.9 K OK 13-Feb-12 04:01 WebServer1_MySQL
2104 Full 2 92.37 K OK 13-Feb-12 04:01 TestServer_MySQL
2105 Full 5 2.259 M OK 13-Feb-12 04:30
MythTVServer1_MySQL_Copy
2109 Full 2 92.37 K OK 13-Feb-12 04:30
TestServer_MySQL_Copy
====
Device status:
Device "Default" (/mnt/backup/Bacula) is not open.
<snip>
Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not open.
Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is mounted
with:
Volume: WebServer1_MySQL_1325
Pool: WebServer1_MySQL
Media type: File
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1) is
not open.
Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is not
open.
Device "WebServer1_Inc_Copy" (/mnt/mac_backup/Bacula/WebServer1/Incrementals)
is not open.
<snip>
Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not open.
====
Used Volume status:
NOTE: bconsole appears to crash here - no further output is produced, and
bconsole does not respond to any key presses. I have to Ctrl + C to exit out
from bconsole. Furthermore, the only way I can clear our the failed jobs from
the 'Running jobs queue' is to exit from bconsole, issue 'sudo service
bacula-sd stop' twice, then restart the SD and restart bacula-director.
What I have is for 4 of my clients I run a MySQL backup hourly at 00:00, 01:00,
etc. I then copy the MySQL backups to another storage resource on my SD at
00:30, 01:30, etc. The MySQL databases which I am backing up are relatively
small, the biggest of which is my Bacula catalog - ~160Mb - although this
backup is currently disabled and the database backed up outside of Bacula until
I can resolve this issue.
Here's the config for one of the client's MySQL backups:
JobDefs {
Name = DefaultBackup
Type = Backup
Accurate = yes
Level = Full
Client = FileServer1-fd
Messages = Standard
Pool = Default
Storage = Default
Priority = 10
Allow Duplicate Jobs = No
Cancel Lower Level Duplicates = yes
}
JobDefs {
Name = DefaultCopy
Type = Copy
Level = Full
Client = FileServer1-fd
Messages = Standard
Selection Type = PoolUncopiedJobs
Priority = 12
}
Job {
Name = TestServer_MySQL
Type = Backup
JobDefs = DefaultBackup
Client = TestServer-fd
FileSet = "MySQL Databases"
ClientRunBeforeJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh
bacula_backup Gromit123"
ClientRunAfterJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh
cleanup"
Schedule = "Hourly MySQL Database Schedule"
Messages = Standard
Pool = TestServer_MySQL
Storage = TestServer_MySQL
Enabled = No
}
Job {
Name = "TestServer_MySQL_Copy"
JobDefs = DefaultCopy
Type = Copy
Client = TestServer-fd
FileSet = "MySQL Databases"
Pool = TestServer_MySQL
Messages = Standard
Schedule = "Hourly MySQL Database Copy Schedule"
Storage = TestServer_MySQL
Enabled = No
}
Reading back through console messages leading up to the crash, there doesn't
appear to be any suggestion for why the jobs have crashed, only messages about
duplicate jobs not being allowed for the jobs which are queued after the
crashed jobs at the top of the queue.
If I can provide any further information to help diagnose this issue, please
let me know and I will be able to provide it.
I hope someone can help, please.
Joe
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users