>>>>> On Mon, 2 Mar 2020 09:44:41 -0500, Josh Fisher said: > > On 3/2/2020 8:45 AM, Andrea Brancatelli wrote: > > > > Are you sure about this? > > > > We have a job that takes 25 days to complete without any recompiling. > > > > Yes. See Max Run Time in the Job resource docs at > https://www.bacula.org/9.6.x-manuals/en/main/Configuring_Director.html
I think that might be out of date information -- it looks like the current timout is 200 days. __Martin > > > > > --- > > *Andrea Brancatelli * > > > > > > On 2020-03-02 12:15, Josh Fisher wrote: > > > >> Bacula has a built-in watchdog that kills a job that runs for more > >> than 6 days. That period can be extended at compile time, so you have > >> to compile your own binaries after a change to the source. I don't > >> remember where in the source, but this has come up before and should > >> be searchable. > >> > >> If you have already extended the watchdog timout, a signal 11 is > >> almost always a software bug and the devs should be able to tell > >> where in the code this happened from the traceback. That said, Bacula > >> running for a long time using lots of pointers is also a decent test > >> of memory, as well as i/o. Hardware errors, anything that causes a > >> bit flip in RAM, usually results in a signal 11. But it is far more > >> likely to be a software issue and you should file a bug report.. > >> > >> > >> On 3/1/2020 6:22 PM, Chaz Vidal wrote: > >>> Greetings all, > >>> Our Bacula system crashed on Friday with a segmentation violation. > >>> > >>> The system has been attempting to do a full backup of over 130TB of data > >>> over the past few weeks which we've appeared to have lost because of the > >>> crash. > >>> > >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Bacula interrupted by > >>> signal 11: Segmentation violation > >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Kaboom! bacula-dir, > >>> bacula-dir got signal 11 - Segmentation violation at 28-Feb-2020 > >>> 09:56:31. Attempting traceback. > >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Kaboom! > >>> exepath=/usr/sbin/ > >>> Feb 28 09:56:31 <<servername>> bacula-dir: Bacula interrupted by signal > >>> 11: Segmentation violation > >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: Calling: > >>> /usr/sbin/btraceback /usr/sbin/bacula-dir 4211 /var/lib/bacula > >>> Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: connect from > >>> localhost[127.0.0.1] > >>> Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: 71CC36008A: > >>> client=localhost[127.0.0.1] > >>> Feb 28 09:56:31 <<servername>> postfix/cleanup[59722]: 71CC36008A: > >>> message-id=<20200227232631.71CC36008A@<<servername>>.company.com> > >>> Feb 28 09:56:31 <<servername>> postfix/qmgr[14399]: 71CC36008A: > >>> from=<root@<<servername>>.company.com>, size=593, nrcpt=1 (queue active) > >>> Feb 28 09:56:31 <<servername>> postfix/smtpd[59719]: disconnect from > >>> localhost[127.0.0.1] helo=1 mail=1 rcpt=1 data=1 quit=1 commands=5 > >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: It looks like the > >>> traceback worked... > >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: LockDump: > >>> /var/lib/bacula/bacula.4211.traceback > >>> Feb 28 09:56:31 <<servername>> bacula-dir[4211]: bacula-dir: > >>> lockmgr.c:1221-0 lockmgr disabled > >>> > >>> I do not know how to read a traceback file to understand what may have > >>> been going on. We are attempting to restart the backup again but unless > >>> we understand what happened the crash may appear again. > >>> > >>> We are running Bacula Version: 9.4.2 > >>> > >>> Appreciate if anyone can share any insight? > >>> > >>> Attempt to dump current JCRs. njcrs=7 > >>> threadid=0x7fb497491f40 JobId=0 JobStatus=R jcr=0x55980a04a4f8 > >>> name=*JobMonitor*.2020-02-11_15.29.48_01 > >>> use_count=1 killable=0 > >>> JobType=I JobLevel= > >>> sched_time=11-Feb-2020 15:29 start_time=11-Feb-2020 15:29 > >>> end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30 > >>> db=(nil) db_batch=(nil) batch_started=0 > >>> wstore=0x55980a01ff28 rstore=0x55980a01ff28 wjcr=(nil) > >>> client=0x55980a026128 reschedule_count=0 SD_msg_chan_started=0 > >>> threadid=0x7fb495897700 JobId=104686 JobStatus=R jcr=0x7fb48806aea8 > >>> name=job1.2020-02-11_17.38.56_13 > >>> use_count=2 killable=1 > >>> JobType=B JobLevel=F > >>> sched_time=11-Feb-2020 17:38 start_time=11-Feb-2020 17:38 > >>> end_time=01-Jan-1970 09:30 wait_time=21-Feb-2020 16:50 > >>> db=0x7fb4880059a8 db_batch=(nil) batch_started=0 > >>> wstore=0x7fb48803fc18 rstore=(nil) wjcr=(nil) > >>> client=0x7fb4880481a8 reschedule_count=0 SD_msg_chan_started=1 > >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true > >>> cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND > >>> StorageId IN (10) AND MediaId!=794" changes=1814 > >>> RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0 > >>> threadid=0x7fb47e7fc700 JobId=104687 JobStatus=R jcr=0x7fb488068978 > >>> name=job2.2020-02-11_17.40.43_14 > >>> use_count=2 killable=1 > >>> JobType=B JobLevel=F > >>> sched_time=11-Feb-2020 17:40 start_time=11-Feb-2020 17:40 > >>> end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30 > >>> db=0x7fb4880059a8 db_batch=(nil) batch_started=0 > >>> wstore=0x7fb48803fc18 rstore=(nil) wjcr=(nil) > >>> client=0x7fb4880481a8 reschedule_count=0 SD_msg_chan_started=1 > >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true > >>> cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND > >>> StorageId IN (10) AND MediaId!=794" changes=1814 > >>> RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0 > >>> threadid=0x7fb43f7fe700 JobId=104928 JobStatus=R jcr=0x7fb44805fa88 > >>> name=job3.2020-02-14_15.47.06_47 > >>> use_count=2 killable=1 > >>> JobType=B JobLevel=F > >>> sched_time=14-Feb-2020 15:46 start_time=14-Feb-2020 15:47 > >>> end_time=01-Jan-1970 09:30 wait_time=27-Feb-2020 22:21 > >>> db=0x7fb4880059a8 db_batch=(nil) batch_started=0 > >>> wstore=0x7fb448034678 rstore=(nil) wjcr=(nil) > >>> client=0x7fb44803c148 reschedule_count=0 SD_msg_chan_started=1 > >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true > >>> cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND > >>> StorageId IN (10) AND MediaId!=794" changes=1814 > >>> RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0 > >>> threadid=0x7fb43e7fc700 JobId=105616 JobStatus=R jcr=0x55980a005fe8 > >>> name=job4.2020-02-21_21.30.01_16 > >>> use_count=2 killable=1 > >>> JobType=B JobLevel=F > >>> sched_time=21-Feb-2020 21:30 start_time=24-Feb-2020 23:36 > >>> end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30 > >>> db=0x7fb4880059a8 db_batch=(nil) batch_started=0 > >>> wstore=0x7fb448033b78 rstore=(nil) wjcr=(nil) > >>> client=0x7fb44803e9e8 reschedule_count=0 SD_msg_chan_started=1 > >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true > >>> cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND > >>> StorageId IN (10) AND MediaId!=794" changes=1814 > >>> RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0 > >>> threadid=0x7fb43effd700 JobId=0 JobStatus=C jcr=0x7fb47800b2e8 > >>> name=-Console-.2020-02-27_08.39.19_09 > >>> use_count=1 killable=0 > >>> JobType=U JobLevel=F > >>> sched_time=27-Feb-2020 08:39 start_time=27-Feb-2020 08:39 > >>> end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30 > >>> db=0x7fb4880059a8 db_batch=(nil) batch_started=0 > >>> wstore=0x7fb448035cd8 rstore=0x7fb448034c18 wjcr=(nil) > >>> client=0x7fb44803ae18 reschedule_count=0 SD_msg_chan_started=0 > >>> BDB=0x7fb4880059a8 db_name=bacula db_user=bacula connected=true > >>> cmd="UPDATE Media SET InChanger=0, Slot=0 WHERE Slot=25 AND > >>> StorageId IN (10) AND MediaId!=794" changes=1814 > >>> RWLOCK=0x7fb4880059c0 w_active=0 w_wait=0 > >>> threadid=0x7fb45f7fe700 JobId=0 JobStatus=C jcr=0x7fb3f400e148 > >>> name=-Console-.2020-02-28_09.00.13_35 > >>> use_count=1 killable=0 > >>> JobType=U JobLevel=F > >>> sched_time=28-Feb-2020 09:00 start_time=28-Feb-2020 09:00 > >>> end_time=01-Jan-1970 09:30 wait_time=01-Jan-1970 09:30 > >>> db=(nil) db_batch=(nil) batch_started=0 > >>> wstore=0x7fb448035cd8 rstore=0x7fb448034c18 wjcr=(nil) > >>> client=0x7fb44803ae18 reschedule_count=0 SD_msg_chan_started=0 > >>> List plugins. Hook count=0 > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bacula-users mailing list > >>> Bacula-users@lists.sourceforge.net > >>> <mailto:Bacula-users@lists.sourceforge.net> > >>> https://lists.sourceforge.net/lists/listinfo/bacula-users > >> > >> > >> _______________________________________________ > >> Bacula-users mailing list > >> Bacula-users@lists.sourceforge.net > >> <mailto:Bacula-users@lists.sourceforge.net> > >> https://lists.sourceforge.net/lists/listinfo/bacula-users > _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users