>>>>> On Mon, 2 Mar 2020 09:44:41 -0500, Josh Fisher said:
> 
> On 3/2/2020 8:45 AM, Andrea Brancatelli wrote:
> >
> > Are you sure about this?
> >
> > We have a job that takes 25 days to complete without any recompiling.
> >
> 
> Yes. See Max Run Time in the Job resource docs at 
> https://www.bacula.org/9.6.x-manuals/en/main/Configuring_Director.html

I think that might be out of date information -- it looks like the current
timout is 200 days.

__Martin

> 
> 
> 
> > ---
> > *Andrea Brancatelli *
> >
> >
> > On 2020-03-02 12:15, Josh Fisher wrote:
> >
> >> Bacula has a built-in watchdog that kills a job that runs for more 
> >> than 6 days. That period can be extended at compile time, so you have 
> >> to compile your own binaries after a change to the source. I don't 
> >> remember where in the source, but this has come up before and should 
> >> be searchable.
> >>
> >> If you have already extended the watchdog timout, a signal 11 is 
> >> almost always a software bug and the devs should be able to tell 
> >> where in the code this happened from the traceback. That said, Bacula 
> >> running for a long time using lots of pointers is also a decent test 
> >> of memory, as well as i/o. Hardware errors, anything that causes a 
> >> bit flip in RAM, usually results in a signal 11. But it is far more 
> >> likely to be a software issue and you should file a bug report..
> >>
> >>
> >> On 3/1/2020 6:22 PM, Chaz Vidal wrote:
> >>> Greetings all,
> >>> Our Bacula system crashed on Friday with a segmentation violation.
> >>>
> >>> The system has been attempting to do a full backup of over 130TB of data 
> >>> over the past few weeks which we've appeared to have lost because of the 
> >>> crash.
> >>>
> >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Bacula interrupted by 
> >>> signal 11: Segmentation violation
> >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Kaboom! bacula-dir, 
> >>> bacula-dir got signal 11 - Segmentation violation at 28-Feb-2020 
> >>> 09:56:31. Attempting traceback.
> >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Kaboom! 
> >>> exepath=/usr/sbin/
> >>> Feb 28 09:56:31 <<servername>> bacula-dir: Bacula interrupted by signal 
> >>> 11: Segmentation violation
> >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Calling: 
> >>> /usr/sbin/btraceback /usr/sbin/bacula-dir 4211 /var/lib/bacula
> >>> Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: connect from 
> >>> localhost[127.0.0.1]
> >>> Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: 71CC36008A: 
> >>> client=localhost[127.0.0.1]
> >>> Feb 28 09:56:31 <<servername>> postfix/cleanup[59722]: 71CC36008A: 
> >>> message-id=<20200227232631.71CC36008A@<<servername>>.company.com>
> >>> Feb 28 09:56:31 <<servername>> postfix/qmgr[14399]: 71CC36008A: 
> >>> from=<root@<<servername>>.company.com>, size=593, nrcpt=1 (queue active)
> >>> Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: disconnect from 
> >>> localhost[127.0.0.1] helo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
> >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: It looks like the 
> >>> traceback worked...
> >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: LockDump: 
> >>> /var/lib/bacula/bacula.4211.traceback
> >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: bacula-dir: 
> >>> lockmgr.c:1221-0 lockmgr disabled
> >>>
> >>> I do not know how to read a traceback file to understand what may have 
> >>> been going on.  We are attempting to restart the backup again but unless 
> >>> we understand what happened the crash may appear again.
> >>>
> >>> We are running Bacula Version: 9.4.2
> >>>
> >>> Appreciate if anyone can share any insight?
> >>>
> >>> Attempt to dump current JCRs. njcrs=7
> >>> threadid=0x7fb497491f40 JobId=0 JobStatus=R jcr=0x55980a04a4f8 
> >>> name=*JobMonitor*.2020-02-11_15.29.48_01
> >>>          use_count=1 killable=0
> >>>          JobType=I JobLevel=
> >>>          sched_time=11-Feb-2020 15:29 start_time=11-Feb-2020 15:29
> >>>          end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
> >>>          db=(nil) db_batch=(nil) batch_started=0
> >>>          wstore=0x55980a01ff28 rstore=0x55980a01ff28 wjcr=(nil) 
> >>> client=0x55980a026128 reschedule_count=0 SD_msg_chan_started=0
> >>> threadid=0x7fb495897700 JobId=104686 JobStatus=R jcr=0x7fb48806aea8 
> >>> name=job1.2020-02-11_17.38.56_13
> >>>          use_count=2 killable=1
> >>>          JobType=B JobLevel=F
> >>>          sched_time=11-Feb-2020 17:38 start_time=11-Feb-2020 17:38
> >>>          end_time=01-Jan-1970 09:30 wait_time=21-Feb-2020 16:50
> >>>          db=0x7fb4880059a8 db_batch=(nil) batch_started=0
> >>>          wstore=0x7fb48803fc18 rstore=(nil) wjcr=(nil) 
> >>> client=0x7fb4880481a8 reschedule_count=0 SD_msg_chan_started=1
> >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
> >>>          cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND 
> >>> StorageId IN (10) AND MediaId!=794" changes=1814
> >>>          RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
> >>> threadid=0x7fb47e7fc700 JobId=104687 JobStatus=R jcr=0x7fb488068978 
> >>> name=job2.2020-02-11_17.40.43_14
> >>>          use_count=2 killable=1
> >>>          JobType=B JobLevel=F
> >>>          sched_time=11-Feb-2020 17:40 start_time=11-Feb-2020 17:40
> >>>          end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
> >>>          db=0x7fb4880059a8 db_batch=(nil) batch_started=0
> >>>          wstore=0x7fb48803fc18 rstore=(nil) wjcr=(nil) 
> >>> client=0x7fb4880481a8 reschedule_count=0 SD_msg_chan_started=1
> >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
> >>>          cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND 
> >>> StorageId IN (10) AND MediaId!=794" changes=1814
> >>>          RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
> >>> threadid=0x7fb43f7fe700 JobId=104928 JobStatus=R jcr=0x7fb44805fa88 
> >>> name=job3.2020-02-14_15.47.06_47
> >>>          use_count=2 killable=1
> >>>          JobType=B JobLevel=F
> >>>          sched_time=14-Feb-2020 15:46 start_time=14-Feb-2020 15:47
> >>>          end_time=01-Jan-1970 09:30 wait_time=27-Feb-2020 22:21
> >>>          db=0x7fb4880059a8 db_batch=(nil) batch_started=0
> >>>          wstore=0x7fb448034678 rstore=(nil) wjcr=(nil) 
> >>> client=0x7fb44803c148 reschedule_count=0 SD_msg_chan_started=1
> >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
> >>>          cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND 
> >>> StorageId IN (10) AND MediaId!=794" changes=1814
> >>>          RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
> >>> threadid=0x7fb43e7fc700 JobId=105616 JobStatus=R jcr=0x55980a005fe8 
> >>> name=job4.2020-02-21_21.30.01_16
> >>>          use_count=2 killable=1
> >>>          JobType=B JobLevel=F
> >>>          sched_time=21-Feb-2020 21:30 start_time=24-Feb-2020 23:36
> >>>          end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
> >>>          db=0x7fb4880059a8 db_batch=(nil) batch_started=0
> >>>          wstore=0x7fb448033b78 rstore=(nil) wjcr=(nil) 
> >>> client=0x7fb44803e9e8 reschedule_count=0 SD_msg_chan_started=1
> >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
> >>>          cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND 
> >>> StorageId IN (10) AND MediaId!=794" changes=1814
> >>>          RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
> >>> threadid=0x7fb43effd700 JobId=0 JobStatus=C jcr=0x7fb47800b2e8 
> >>> name=-Console-.2020-02-27_08.39.19_09
> >>>          use_count=1 killable=0
> >>>          JobType=U JobLevel=F
> >>>          sched_time=27-Feb-2020 08:39 start_time=27-Feb-2020 08:39
> >>>          end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
> >>>          db=0x7fb4880059a8 db_batch=(nil) batch_started=0
> >>>          wstore=0x7fb448035cd8 rstore=0x7fb448034c18 wjcr=(nil) 
> >>> client=0x7fb44803ae18 reschedule_count=0 SD_msg_chan_started=0
> >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true
> >>>          cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND 
> >>> StorageId IN (10) AND MediaId!=794" changes=1814
> >>>          RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0
> >>> threadid=0x7fb45f7fe700 JobId=0 JobStatus=C jcr=0x7fb3f400e148 
> >>> name=-Console-.2020-02-28_09.00.13_35
> >>>          use_count=1 killable=0
> >>>          JobType=U JobLevel=F
> >>>          sched_time=28-Feb-2020 09:00 start_time=28-Feb-2020 09:00
> >>>          end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30
> >>>          db=(nil) db_batch=(nil) batch_started=0
> >>>          wstore=0x7fb448035cd8 rstore=0x7fb448034c18 wjcr=(nil) 
> >>> client=0x7fb44803ae18 reschedule_count=0 SD_msg_chan_started=0
> >>> List plugins. Hook count=0
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bacula-users mailing list
> >>> Bacula-users@lists.sourceforge.net 
> >>> <mailto:Bacula-users@lists.sourceforge.net>
> >>> https://lists.sourceforge.net/lists/listinfo/bacula-users
> >>
> >>
> >> _______________________________________________
> >> Bacula-users mailing list
> >> Bacula-users@lists.sourceforge.net 
> >> <mailto:Bacula-users@lists.sourceforge.net>
> >> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to