Hey,
I'm running Bacula 3.0.2 on FreeBSD/amd64 7.2 using Postgresql 8.3.7
and currently runing into a problem where the bacula director crashes
with a segfault / bus error when I run some of my backup jobs. It's
100% reproduceable for at least the one job, but another job worked
just fine.
I suspected that perhaps that there was a database consistency
problem, so I ran dbcheck which found some orphaned Path and Filename
records, but otherwise didn't complain.
It seems like either a use after free error, or perhaps just incorrect
error handling.
When running with debug I get the following shortly before the crash:
sp-dir: getmsg.c:138-2067 bget_dirmsg 64: Status
Job=crit-source-safe.2009-10-08_12.50.11_03 JobStatus=82
sp-dir: sql_get.c:1181-2067
db_accurate_get_jobids=ªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªª
I have attached the rest of the debug output I collected (just let me
know if more is needed).
Does anyone have an idea about what's going on?
PS. I'm generally very happy with Bacula, so thanks to all involved
who have helped bring Bacula to where it is today :-).
--
Simon L. Nielsen
bacula-dir: dird.c:184-0 Debug level = 100
bacula-dir: jcr.c:140-0 read_last_jobs seek to 192
bacula-dir: jcr.c:147-0 Read num_items=10
bacula-dir: postgresql.c:103-0 db_open first time
bacula-dir: postgresql.c:226-0 pg_real_connect done
bacula-dir: postgresql.c:228-0 db_user=bacula db_name=bacula db_password=123
sp-dir: bnet_server.c:96-0 Addresses host[ipv4:0.0.0.0:9101]
sp-dir: job.c:1276-0 wstorage=sd-file-critical
sp-dir: job.c:1285-0 wstore=sd-file-critical where=Job resource
sp-dir: job.c:937-0 JobId=0 created Job=*JobMonitor*.2009-10-08_12.50.02_01
sp-dir: bnet.c:670-0 who=client host=127.0.0.1 port=36131
sp-dir: job.c:1276-0 wstorage=sd-file-critical
sp-dir: job.c:1285-0 wstore=sd-file-critical where=Job resource
sp-dir: job.c:937-0 JobId=0 created Job=-Console-.2009-10-08_12.50.07_02
sp-dir: cram-md5.c:73-0 send: auth cram-md5 <32735827.1255006...@sp-dir> ssl=2
sp-dir: cram-md5.c:133-0 cram-get received: auth cram-md5
<612947717.1255006...@bconsole> ssl=2
sp-dir: cram-md5.c:152-0 sending resp to challenge: uRY4GC/oz5/nQFRNm6ICyB
sp-dir: bnet.c:263-0 TLS server negotiation established.
sp-dir: ua_cmds.c:1990-0 UA Open database
sp-dir: postgresql.c:103-0 db_open first time
sp-dir: postgresql.c:226-0 pg_real_connect done
sp-dir: postgresql.c:228-0 db_user=bacula db_name=bacula db_password=123
sp-dir: ua_run.c:1325-0 Using pool pool-critical
sp-dir: job.c:1276-0 wstorage=sd-file-crit-source-safe
sp-dir: job.c:1285-0 wstore=sd-file-crit-source-safe where=Job resource
sp-dir: job.c:1310-0 wstore=sd-file-crit-source-safe where=Job resource
sp-dir: ua_run.c:178-0 JobId=0 using pool pool-critical priority=5
sp-dir: job.c:937-0 JobId=0 created Job=crit-source-safe.2009-10-08_12.50.11_03
sp-dir: job.c:127-0 Open database
sp-dir: postgresql.c:96-0 DB REopen 1 bacula
sp-dir: job.c:809-0 Created Client files-fd record 2
sp-dir: job.c:173-2067 Created job record JobId=2067
Name=crit-source-safe.2009-10-08_12.50.11_03 Type=B Level=I
sp-dir: sql_find.c:138-2067 Got start time: 2009-09-23 23:05:01
sp-dir: sql_find.c:186-2067 Got start time: 2009-07-04 23:05:01
sp-dir: fd_cmds.c:214-2067 have_full=1 do_full=0 now=1255006211
full_time=1246748701
sp-dir: fd_cmds.c:271-2067 Level=I last start time=2009-09-23 23:05:01
sp-dir: ua_run.c:182-0 JobId=2067 NewJobId=2067 using pool pool-critical
priority=5
sp-dir: backup.c:198-2067 JobId=2067 JobLevel=I
sp-dir: msgchan.c:104-2067 bnet_connect to Storage daemon admbk0:9103
sp-dir: bsock.c:221-2067 Current host[ipv4:127.0.0.1:9103] All
host[ipv4:127.0.0.1:9103]
sp-dir: bsock.c:155-2067 who=Storage daemon host=admbk0 port=9103
sp-dir: cram-md5.c:133-2067 cram-get received: auth cram-md5
<1890150453.1255006...@admbk0-sd> ssl=2
sp-dir: cram-md5.c:152-2067 sending resp to challenge: l9db6U/2O6glyk/XLT+HWD
sp-dir: cram-md5.c:80-2067 send: auth cram-md5 <1117200456.1255006...@sp-dir>
ssl=2
sp-dir: cram-md5.c:99-2067 Authenticate OK bh+Np5Iis3/ZEjNzjz+foC
sp-dir: bnet.c:311-2067 TLS client negotiation established.
sp-dir: msgchan.c:198-2067 >stored: JobId=2067
job=crit-source-safe.2009-10-08_12.50.11_03 job_name=crit-source-safe
client_name=files-fd type=66 level=73 FileSet=crit-source-safe NoAttr=0
SpoolAttr=0 FileSetMD5=G/+i4y+Qx8+MJT4rQ8+mnD SpoolData=0 WritePartAfterJob=1
PreferMountedVols=1 SpoolSize=0
sp-dir: getmsg.c:138-2067 bget_dirmsg 91: 3000 OK Job SDid=1 SDtime=1255005861
Authorization=MMML-LJOG-KFKD-KPJH-BNNK-OJAG-IMOP-ODNK
sp-dir: msgchan.c:200-2067 <stored: 3000 OK Job SDid=1 SDtime=1255005861
Authorization=MMML-LJOG-KFKD-KPJH-BNNK-OJAG-IMOP-ODNK
sp-dir: msgchan.c:286-2067 wstore >stored: use storage=sd-file-crit-source-safe
media_type=sd-file-crit-source-safe pool_name=pool-critical pool_type=Backup
append=1 copy=0 stripe=0
sp-dir: msgchan.c:293-2067 >stored: use device=sd-file-crit-source-safe
sp-dir: getmsg.c:138-2067 bget_dirmsg 123: CatReq
Job=crit-source-safe.2009-10-08_12.50.11_03 FindMedia=1 pool_name=pool-critical
media_type=sd-file-crit-source-safe
sp-dir: catreq.c:126-2067 catreq CatReq
Job=crit-source-safe.2009-10-08_12.50.11_03 FindMedia=1 pool_name=pool-critical
media_type=sd-file-crit-source-safe
sp-dir: next_vol.c:63-2067 find_next_vol_for_append: JobId=2067 PoolId=4,
MediaType=sd-file-crit-source-safe
sp-dir: sql_find.c:367-2067 fnextvol=SELECT
MediaId,VolumeName,VolJobs,VolFiles,VolBlocks,VolBytes,VolMounts,VolErrors,VolWrites,MaxVolBytes,VolCapacityBytes,MediaType,VolStatus,PoolId,VolRetention,VolUseDuration,MaxVolJobs,MaxVolFiles,Recycle,Slot,FirstWritten,LastWritten,InChanger,EndFile,EndBlock,VolParts,LabelType,LabelDate,StorageId,Enabled,LocationId,RecycleCount,InitialWrite,ScratchPoolId,RecyclePoolId,VolReadTime,VolWriteTime
FROM Media WHERE PoolId=4 AND MediaType='sd-file-crit-source-safe' AND
Enabled=1 AND VolStatus='Append' ORDER BY LastWritten IS NULL,LastWritten
DESC,MediaId LIMIT 1
sp-dir: sql_find.c:444-2067 Rtn numrows=1
sp-dir: next_vol.c:165-2067 VolJobs=0 FirstWritten=0
sp-dir: catreq.c:148-2067 find_media ok=1 idx=1
vol=crit-source-safe-20090924-j2031-incremental
sp-dir: catreq.c:100-2067 Vol Info for crit-source-safe.2009-10-08_12.50.11_03:
1000 OK VolName=crit-source-safe-20090924-j2031-incremental VolJobs=0
VolFiles=0 VolBlocks=0 VolBytes=0 VolMounts=0 VolErrors=0 VolWrites=0
MaxVolBytes=0 VolCapacityBytes=0 VolStatus=Append Slot=0 MaxVolJobs=1
MaxVolFiles=0 InChanger=0 VolReadTime=0 VolWriteTime=0 EndFile=0 EndBlock=0
VolParts=0 LabelType=0 MediaId=1752
sp-dir: getmsg.c:138-2067 bget_dirmsg 51: 3000 OK use device
device=sd-file-crit-source-safe
sp-dir: msgchan.c:299-2067 <stored: 3000 OK use device
device=sd-file-crit-source-safe
sp-dir: msgchan.c:336-2067 Start SD msg_thread.
sp-dir: msgchan.c:386-2067 Start msg_thread loop
sp-dir: getmsg.c:138-2067 bget_dirmsg 64: Status
Job=crit-source-safe.2009-10-08_12.50.11_03 JobStatus=70
sp-dir: msgchan.c:348-2067 SD msg_thread started. use=2
sp-dir: bsock.c:221-2067 Current host[ipv4:10.4.4.45:9102] All
host[ipv4:10.4.4.45:9102]
sp-dir: bsock.c:155-2067 who=Client: files-fd host=files.sp.s port=9102
sp-dir: fd_cmds.c:109-2067 Opened connection with File daemon
sp-dir: authenticate.c:196-2067 Sent: Hello Director sp-dir calling
sp-dir: cram-md5.c:133-2067 cram-get received: auth cram-md5
<1187180428.1255006...@files-fd> ssl=2
sp-dir: cram-md5.c:152-2067 sending resp to challenge: +k/+Xg/AJno8W4hFg/VBUC
sp-dir: cram-md5.c:80-2067 send: auth cram-md5 <867484521.1255006...@sp-dir>
ssl=2
sp-dir: cram-md5.c:99-2067 Authenticate OK 1++7M0ULh/+P5llEXzBrJC
sp-dir: bnet.c:311-2067 TLS client negotiation established.
sp-dir: fd_cmds.c:130-2067 >filed: JobId=2067
Job=crit-source-safe.2009-10-08_12.50.11_03 SDid=1 SDtime=1255005861
Authorization=MMML-LJOG-KFKD-KPJH-BNNK-OJAG-IMOP-ODNK
sp-dir: getmsg.c:138-2067 bget_dirmsg 74: 2000 OK Job 3.0.2 (18Jul09)
i386-portbld-freebsd7.2,freebsd,7.2-RELEASE-p4
sp-dir: getmsg.c:138-2067 bget_dirmsg 16: 2000 OK include
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg -1:
sp-dir: getmsg.c:138-2067 bget_dirmsg 14: 2000 OK level
sp-dir: getmsg.c:138-2067 bget_dirmsg 16: 2000 OK storage
sp-dir: getmsg.c:138-2067 bget_dirmsg 55: 3010 Job
crit-source-safe.2009-10-08_12.50.11_03 start
sp-dir: getmsg.c:138-2067 bget_dirmsg 64: Status
Job=crit-source-safe.2009-10-08_12.50.11_03 JobStatus=82
sp-dir: sql_get.c:1181-2067
db_accurate_get_jobids=ªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªª
NOTICE: identifier
"ªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªª"
will be truncated to
"ªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªª"
sp-dir: postgresql.c:549-2067 Result status failed: SELECT sum(JobFiles) FROM
Job WHERE JobId IN
(ªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªªª)
sp-dir: postgresql.c:103-2067 db_open first time
sp-dir: postgresql.c:226-2067 pg_real_connect done
sp-dir: postgresql.c:228-2067 db_user=bacula db_name=bacula db_password=123
sp-dir: sql.c:737-2067 initdb ref=1 connected=1 db=802713800
sp-dir: postgresql.c:549-2067 Result status failed: SELECT Path.Path,
Filename.Name, File.FileIndex, File.JobId, File.LStat FROM ( SELECT max(FileId)
as FileId, PathId, FilenameId FROM (SELECT FileId, PathId, FilenameId FROM File
WHERE JobId IN (·r)) AS F GROUP BY PathId, FilenameId ) AS Temp JOIN
Filename ON (Filename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId =
Temp.PathId) JOIN File ON (File.FileId = Temp.FileId) WHERE File.FileIndex > 0
ORDER BY JobId, FileIndex ASC
Bacula interrupted by signal 10: BUS error
Kaboom! bacula-dir, sp-dir got signal 10 - BUS error. Attempting traceback.
Kaboom! exepath=/usr/local/sbin/
Calling: /usr/local/sbin/btraceback /usr/local/sbin/bacula-dir 1513
Traceback complete, attempting cleanup ...
Dumping: /var/db/bacula/sp-dir.1513.bactrace
sp-dir: lockmgr.c:735 lockmgr disabled
Attempt to dump current JCRs
JCR=0x801eec028 JobId=0 name=*JobMonitor*.2009-10-08_12.50.02_01 JobStatus=R
use_count=1
JobType=I JobLevel=
sched_time=08-Oct-2009 12:50 start_time=08-Oct-2009 12:50
end_time=01-Jan-1970 00:00 wait_time=01-Jan-1970 00:00
dequeing=0
db=0x0 db_batch=0x0 batch_started=0
JCR=0x802603028 JobId=0 name=-Console-.2009-10-08_12.50.07_02 JobStatus=R
use_count=1
JobType=U JobLevel=
sched_time=08-Oct-2009 12:50 start_time=08-Oct-2009 12:50
end_time=01-Jan-1970 00:00 wait_time=01-Jan-1970 00:00
dequeing=0
db=0x8026132b8 db_batch=0x0 batch_started=0
B_DB=0x8026132b8 db_name=bacula db_user=bacula connected=1
cmd="UPDATE Job SET JobStatus='R',Level='I',StartTime='2009-10-08
12:50:14',ClientId=2,JobTDate=1255006214,PoolId=4,FileSetId=2 WHERE JobId=2067"
changes=0
RWLOCK=0x8026132c8 w_active=0 w_wait=0
JCR=0x802631028 JobId=2067 name=crit-source-safe.2009-10-08_12.50.11_03
JobStatus=R
use_count=2
JobType=B JobLevel=I
sched_time=08-Oct-2009 12:50 start_time=08-Oct-2009 12:50
end_time=01-Jan-1970 00:00 wait_time=01-Jan-1970 00:00
dequeing=0
db=0x8026132b8 db_batch=0x8027572b8 batch_started=0
B_DB=0x8026132b8 db_name=bacula db_user=bacula connected=1
cmd="UPDATE Job SET JobStatus='R',Level='I',StartTime='2009-10-08
12:50:14',ClientId=2,JobTDate=1255006214,PoolId=4,FileSetId=2 WHERE JobId=2067"
changes=0
RWLOCK=0x8026132c8 w_active=0 w_wait=0
Attempt to dump plugins
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users