On 3/2/2020 8:45 AM, Andrea Brancatelli wrote:

Are you sure about this?

We have a job that takes 25 days to complete without any recompiling.


Yes. See Max Run Time in the Job resource docs at https://www.bacula.org/9.6.x-manuals/en/main/Configuring_Director.html



---
*Andrea Brancatelli *


On 2020-03-02 12:15, Josh Fisher wrote:

Bacula has a built-in watchdog that kills a job that runs for more than 6 days. That period can be extended at compile time, so you have to compile your own binaries after a change to the source. I don't remember where in the source, but this has come up before and should be searchable.

If you have already extended the watchdog timout, a signal 11 is almost always a software bug and the devs should be able to tell where in the code this happened from the traceback. That said, Bacula running for a long time using lots of pointers is also a decent test of memory, as well as i/o. Hardware errors, anything that causes a bit flip in RAM, usually results in a signal 11. But it is far more likely to be a software issue and you should file a bug report..


On 3/1/2020 6:22 PM, Chaz Vidal wrote:
Greetings all,
Our Bacula system crashed on Friday with a segmentation violation.

The system has been attempting to do a full backup of over 130TB of data over 
the past few weeks which we've appeared to have lost because of the crash.

Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Bacula interrupted by signal 
11: Segmentation violation
Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Kaboom! bacula-dir, bacula-dir 
got signal 11 - Segmentation violation at 28-Feb-2020 09:56:31. Attempting traceback.
Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Kaboom! exepath=/usr/sbin/
Feb 28 09:56:31 <<servername>> bacula-dir: Bacula interrupted by signal 11: 
Segmentation violation
Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Calling: /usr/sbin/btraceback 
/usr/sbin/bacula-dir 4211 /var/lib/bacula
Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: connect from 
localhost[127.0.0.1]
Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: 71CC36008A: 
client=localhost[127.0.0.1]
Feb 28 09:56:31 <<servername>> postfix/cleanup[59722]: 71CC36008A: 
message-id=<20200227232631.71CC36008A@<<servername>>.company.com>
Feb 28 09:56:31 <<servername>> postfix/qmgr[14399]: 71CC36008A: 
from=<root@<<servername>>.company.com>, size=593, nrcpt=1 (queue active)
Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: disconnect from 
localhost[127.0.0.1] helo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Feb 28 09:56:31 <<servername>> bacula-dir[4211]: It looks like the traceback 
worked...
Feb 28 09:56:31 <<servername>> bacula-dir[4211]: LockDump: 
/var/lib/bacula/bacula.4211.traceback
Feb 28 09:56:31 <<servername>> bacula-dir[4211]: bacula-dir: lockmgr.c:1221-0 
lockmgr disabled

I do not know how to read a traceback file to understand what may have been 
going on.  We are attempting to restart the backup again but unless we 
understand what happened the crash may appear again.

We are running Bacula Version: 9.4.2

Appreciate if anyone can share any insight?

Attempt to dump current JCRs. njcrs=7
threadid=0x7fb497491f40 JobId=0 JobStatus=R jcr=0x55980a04a4f8 
name=*JobMonitor*.2020-02-11_15.29.48_01
         use_count=1 killable=0
         JobType=I JobLevel=
         sched_time=11-Feb-2020 15:29 start_time=11-Feb-2020 15:29
         end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
         db=(nil) db_batch=(nil) batch_started=0
         wstore=0x55980a01ff28 rstore=0x55980a01ff28 wjcr=(nil) 
client=0x55980a026128 reschedule_count=0 SD_msg_chan_started=0
threadid=0x7fb495897700 JobId=104686 JobStatus=R jcr=0x7fb48806aea8 
name=job1.2020-02-11_17.38.56_13
         use_count=2 killable=1
         JobType=B JobLevel=F
         sched_time=11-Feb-2020 17:38 start_time=11-Feb-2020 17:38
         end_time=01-Jan-1970 09:30 wait_time=21-Feb-2020 16:50
         db=0x7fb4880059a8 db_batch=(nil) batch_started=0
         wstore=0x7fb48803fc18 rstore=(nil) wjcr=(nil) client=0x7fb4880481a8 
reschedule_count=0 SD_msg_chan_started=1
BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
         cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND StorageId IN 
(10) AND MediaId!=794" changes=1814
         RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
threadid=0x7fb47e7fc700 JobId=104687 JobStatus=R jcr=0x7fb488068978 
name=job2.2020-02-11_17.40.43_14
         use_count=2 killable=1
         JobType=B JobLevel=F
         sched_time=11-Feb-2020 17:40 start_time=11-Feb-2020 17:40
         end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
         db=0x7fb4880059a8 db_batch=(nil) batch_started=0
         wstore=0x7fb48803fc18 rstore=(nil) wjcr=(nil) client=0x7fb4880481a8 
reschedule_count=0 SD_msg_chan_started=1
BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
         cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND StorageId IN 
(10) AND MediaId!=794" changes=1814
         RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
threadid=0x7fb43f7fe700 JobId=104928 JobStatus=R jcr=0x7fb44805fa88 
name=job3.2020-02-14_15.47.06_47
         use_count=2 killable=1
         JobType=B JobLevel=F
         sched_time=14-Feb-2020 15:46 start_time=14-Feb-2020 15:47
         end_time=01-Jan-1970 09:30 wait_time=27-Feb-2020 22:21
         db=0x7fb4880059a8 db_batch=(nil) batch_started=0
         wstore=0x7fb448034678 rstore=(nil) wjcr=(nil) client=0x7fb44803c148 
reschedule_count=0 SD_msg_chan_started=1
BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
         cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND StorageId IN 
(10) AND MediaId!=794" changes=1814
         RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
threadid=0x7fb43e7fc700 JobId=105616 JobStatus=R jcr=0x55980a005fe8 
name=job4.2020-02-21_21.30.01_16
         use_count=2 killable=1
         JobType=B JobLevel=F
         sched_time=21-Feb-2020 21:30 start_time=24-Feb-2020 23:36
         end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
         db=0x7fb4880059a8 db_batch=(nil) batch_started=0
         wstore=0x7fb448033b78 rstore=(nil) wjcr=(nil) client=0x7fb44803e9e8 
reschedule_count=0 SD_msg_chan_started=1
BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
         cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND StorageId IN 
(10) AND MediaId!=794" changes=1814
         RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
threadid=0x7fb43effd700 JobId=0 JobStatus=C jcr=0x7fb47800b2e8 
name=-Console-.2020-02-27_08.39.19_09
         use_count=1 killable=0
         JobType=U JobLevel=F
         sched_time=27-Feb-2020 08:39 start_time=27-Feb-2020 08:39
         end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
         db=0x7fb4880059a8 db_batch=(nil) batch_started=0
         wstore=0x7fb448035cd8 rstore=0x7fb448034c18 wjcr=(nil) 
client=0x7fb44803ae18 reschedule_count=0 SD_msg_chan_started=0
BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
         cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND StorageId IN 
(10) AND MediaId!=794" changes=1814
         RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
threadid=0x7fb45f7fe700 JobId=0 JobStatus=C jcr=0x7fb3f400e148 
name=-Console-.2020-02-28_09.00.13_35
         use_count=1 killable=0
         JobType=U JobLevel=F
         sched_time=28-Feb-2020 09:00 start_time=28-Feb-2020 09:00
         end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
         db=(nil) db_batch=(nil) batch_started=0
         wstore=0x7fb448035cd8 rstore=0x7fb448034c18 wjcr=(nil) 
client=0x7fb44803ae18 reschedule_count=0 SD_msg_chan_started=0
List plugins. Hook count=0



_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net <mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net <mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to