On 13 Feb, 2012,at 02:30 PM, "Joe Nyland" <joenyl...@me.com> wrote:
Hello,
I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon:
sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.log
However, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options.
Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again.
Thanks,
Joe
On 13 Feb, 2012,at 02:11 PM, John Drescher <dresche...@gmail.com> wrote:2012/2/13 Joe Nyland <joenyl...@me.com>:
> Hello everyone,
>
> I hope someone would be able to offer any suggestions of why I am seeing the
> following behaviour in my current Bacula setup:
>
> Since the tail end of last week, I have been having issues with my MySQL
> backups in Bacula, where they would randomly appear to 'crash', normally
> when performing a copy of a backup to another pool - but I'm not sure yet if
> this is the trigger.
>
> Running 'status dir' after one of these 'crashes' gives the following output
> for the running jobs:
>
> Running Jobs:
> Console connected at 12-Feb-12 15:53
> Console connected at 13-Feb-12 06:58
> JobId Level Name Status
> ======================================================================
> 2107 Full WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running
> <Crashed Job>
> 2108 Full WebServer1_MySQL.2012-02-13_04.30.00_29 is running <Crashed
> Job>
> 2111 Full MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for
> higher priority jobs to finish
> 2113 Full TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution
> 2114 Full MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting
> execution
> 2115 Full WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting
> execution
> 2116 Full WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error
> 2117 Full TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting
> execution
> 2121 Full MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting
> execution
> 2122 Full WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting
> execution
> 2123 Full WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error
> 2124 Full TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting
> execution
> 2125 Full MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error
> 2126 Full WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error
> ====
>
> Once the above appears, I am unable to view the status of any storage
> resource on my SD:
>
> *status storage=FileServer1_Full
> Connecting to Storage daemon FileServer1_Full at FileServer1:9103
>
> FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu
> 10.04
> Daemon started 12-Feb-12 15:53, 92 Jobs run since started.
> Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577
> max_bufs=994
> Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8
>
> Running Jobs:
> Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107
> Volume="WebServer1_MySQL_1325"
> pool="WebServer1_MySQL" device="WebServer1_MySQL"
> (/mnt/backup/Bacula/Databases/WebServer1)
> Files=4 Bytes=164,924 Bytes/sec=17
> FDSocket closed
> ====
>
> Jobs waiting to reserve a drive:
> ====
>
> Terminated Jobs:
> JobId Level Files Bytes Status Finished Name
> ===================================================================
> 2091 Full 2 92.45 K OK 13-Feb-12 03:30
> TestServer_MySQL_Copy
> 2096 Full 5 2.258 M OK 13-Feb-12 03:30
> MythTVServer1_MySQL_Copy
> 2098 Full 4 164.9 K OK 13-Feb-12 03:30
> WebServer1_MySQL_Copy
> 2100 Full 2 92.45 K OK 13-Feb-12 03:30
> TestServer_MySQL_Copy
> 2078 Full 1,145 2.942 G OK 13-Feb-12 03:31 SVN_Copy
> 2102 Full 5 2.259 M OK 13-Feb-12 04:01
> MythTVServer1_MySQL
> 2103 Full 4 164.9 K OK 13-Feb-12 04:01
> WebServer1_MySQL
> 2104 Full 2 92.37 K OK 13-Feb-12 04:01
> TestServer_MySQL
> 2105 Full 5 2.259 M OK 13-Feb-12 04:30
> MythTVServer1_MySQL_Copy
> 2109 Full 2 92.37 K OK 13-Feb-12 04:30
> TestServer_MySQL_Copy
> ====
>
> Device status:
> Device "Default" (/mnt/backup/Bacula) is not open.
> <snip>
> Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not
> open.
> Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is
> mounted with:
> Volume: WebServer1_MySQL_1325
> Pool: WebServer1_MySQL
> Media type: File
> Total Bytes Read=0 Blocks Read=0 Bytes/block=0
> Positioned at File=0 Block=0
> Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1)
> is not open.
> Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is
> not open.
> Device "WebServer1_Inc_Copy"
> (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open.
> <snip>
> Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not
> open.
> ====
>
> Used Volume status:
>
> NOTE: bconsole appears to crash here - no further output is produced, and
> bconsole does not respond to any key presses. I have to Ctrl + C to exit out
> from bconsole. Furthermore, the only way I can clear our the failed jobs
> from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service
> bacula-sd stop' twice, then restart the SD and restart bacula-director.
>
>
> What I have is for 4 of my clients I run a MySQL backup hourly at 00:00,
> 01:00, etc. I then copy the MySQL backups to another storage resource on my
> SD at 00:30, 01:30, etc. The MySQL databases which I am backing up are
> relatively small, the biggest of which is my Bacula catalog - ~160Mb -
> although this backup is currently disabled and the database backed up
> outside of Bacula until I can resolve this issue.
>
> Here's the config for one of the client's MySQL backups:
>
> JobDefs {
> Name = DefaultBackup
> Type = Backup
> Accurate = yes
> Level = Full
> Client = FileServer1-fd
> Messages = Standard
> Pool = Default
> Storage = Default
> Priority = 10
> Allow Duplicate Jobs = No
> Cancel Lower Level Duplicates = yes
> }
>
> JobDefs {
> Name = DefaultCopy
> Type = Copy
> Level = Full
> Client = FileServer1-fd
> Messages = Standard
> Selection Type = PoolUncopiedJobs
> Priority = 12
> }
>
> Job {
> Name = TestServer_MySQL
> Type = Backup
> JobDefs = DefaultBackup
> Client = TestServer-fd
> FileSet = "MySQL Databases"
> ClientRunBeforeJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh
> bacula_backup Gromit123"
> ClientRunAfterJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh
> cleanup"
> Schedule = "Hourly MySQL Database Schedule"
> Messages = Standard
> Pool = TestServer_MySQL
> Storage = TestServer_MySQL
> Enabled = No
> }
>
> Job {
> Name = "TestServer_MySQL_Copy"
> JobDefs = DefaultCopy
> Type = Copy
> Client = TestServer-fd
> FileSet = "MySQL Databases"
> Pool = TestServer_MySQL
> Messages = Standard
> Schedule = "Hourly MySQL Database Copy Schedule"
> Storage = TestServer_MySQL
> Enabled = No
> }
>
> Reading back through console messages leading up to the crash, there doesn't
> appear to be any suggestion for why the jobs have crashed, only messages
> about duplicate jobs not being allowed for the jobs which are queued after
> the crashed jobs at the top of the queue.
>
>
> If I can provide any further information to help diagnose this issue, please
> let me know and I will be able to provide it.
>
I would look at the log for the sd. One way to get this is to run
bacula-sd in a console with the debug -d 100 option enabled instead of
running it as a daemon. You can also google for bacula kaboom for more
debugging tips.
John
Hi John,
Thank you for your reply too - only just received it after replying to Adrian Reyer.
That sounds like a logical step to me too. I'll set this up later on, so that it's in place for when it happens again.
Thank you for your input.
Joe
Hello,
I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon:
sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.log
However, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options.
Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again.
Thanks,
Joe
------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users