I've worked around this for now, but it appears there is a bug in the just in time device scheduling. 2 jobs can grab the read devices and then there my be no write devices and bareos just hangs.
I learned that changing the maximum concurrent jobs parameters do not appear to take effect with a "reload"; one must restart the director. I ended creating separate storage devices for onsite and offsite backups to keep the offsite backups from fighting for a storage resource. On Thursday, April 17, 2025 at 12:48:24 PM UTC-5 Jon Schewe wrote: > I have some backups to an LTO-8 pool > I have some backups to an LTO-9 pool (that I'm migrating to from LTO-8) > I have 2 LTO-8 drives in a changer > I have 2 LTO-9 drives in a changer > > I'm doing VirtualFull backups with a destination of an offsite LTO-9 pool. > > I'm finding that bareos is starting 2 VirtualFull backups at the same time > and appears to be deadlocked waiting for drives. I expected bareos to > reserve a drive for reading and for writing and then go and block other > jobs. > > Things that I've tried change to reduce this down to a single job running > at a time: > - Director -> Director -> Maximum Concurrent Jobs = 1 > - Director -> Client (bareos-fd) -> Maximum Concurrent Jobs = 1 > - Director -> Client (client1-fd) -> Maximum Concurrent Jobs = 1 > - Director -> Storage (LTO-8) -> Maximum Concurrent Jobs = 1 > - Director -> Storage (LTO-9) -> Maximum Concurrent Jobs = 1 > > I'm doing a reload after making each change and I have not undone any of > the changes. > After I reload I cancel one of the running jobs and add it back to the > queue so that it gets picked up later. > I'm still seeing bareos execute 2 jobs and neither is making any progress. > > Output of one of the jobs > > 2025-04-17 13:06:02 bareos-dir JobId 20624: Version: > 24.0.3~pre0.54685a85d (27 March 2025) Red Hat Enterprise Linux release 9.5 > (Plow) > 2025-04-17 13:06:02 bareos-dir JobId 20624: Start Virtual Backup JobId > 20624, Job=client1-job1-offsite.2025-04-12_00.01.01_17 > 2025-04-17 13:06:02 bareos-dir JobId 20624: Bootstrap records written to > /var/lib/bareos/bareos-dir.restore.100.bsr > 2025-04-17 13:06:02 bareos-dir JobId 20624: Consolidating JobIds > 20078,20239,20393,20543 containing 49 files > 2025-04-17 13:06:02 bareos-dir JobId 20624: Connected Storage daemon at > bareos.mgmt.bbn.com:9103, encryption: TLS_AES_256_GCM_SHA384 TLSv1.3 > 2025-04-17 13:06:02 bareos-dir JobId 20624: Encryption: > TLS_AES_256_GCM_SHA384 TLSv1.3 > 2025-04-17 13:06:03 bareos-dir JobId 20624: Using Device "LTO-9_drive1" > to read. > 2025-04-17 13:06:03 bareos-sd JobId 20624: Using just in time reservation > for job 20624 > 2025-04-17 13:06:03 bareos-dir JobId 20624: Using Device "JustInTime > Device" to write. > > LTO-9 storage status > JobId=20624 Level=Virtual Full Type=Backup Name=client1-job1-offsite > Status=Created > > Reading: Volume="" > > > pool="onsite-LTO-9" device="LTO-9_drive1" > (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst) > Writing: Volume="" > > > pool="offsite-LTO-9" device="LTO-9_drive1" > (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst) > > spooling=0 despooling=0 despool_wait=0 > > Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0 > > > FDSocket closed > > > > JobId=20647 Level=Virtual Full Type=Backup Name=client1-job2-offsite > Status=Created > Reading: Volume="" > > pool="onsite-LTO-9" device="LTO-9_drive0" > (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst) > Writing: Volume="" > > pool="offsite-LTO-9" device="LTO-9_drive1" > (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst) > spooling=0 despooling=0 despool_wait=0 > > > Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0 > > > FDSocket closed > > > > > > ==== > > > > Jobs waiting to reserve a drive: > > 3603 JobId=20624 device "LTO-9_drive0" > (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst) is busy reading. > 3609 JobId=20624 Max concurrent jobs exceeded on drive "LTO-9_drive1" > (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst). > 3603 JobId=20647 device "LTO-9_drive0" > (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst) is busy reading. > 3609 JobId=20647 Max concurrent jobs exceeded on drive "LTO-9_drive1" > (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst). > > ... > Used Volume status: > ANJ645L9 on device "LTO-9_drive0" > (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst) > Reader=1 writers=0 reserves=1 volinuse=0 > ANJ646L9 on device "LTO-9_drive1" > (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst) > Reader=0 writers=0 reserves=1 volinuse=0 > Read Volume: 003048L8 no device. volinuse= 0 > Read Volume: 003041L8 no device. volinuse= 0 > Read Volume: 003048L8 no device. volinuse= 0 > Read Volume: ANJ621L9 no device. volinuse= 0 > Read Volume: ANJ651L9 no device. volinuse= 0 > > > The status of the LTO-8 storage only shows me the LTO-9 information, > nothing about LTO-8 drives in use. > > How do I get bareos unstuck? > > Jon > > -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/bareos-users/b5d50786-3e15-431e-9757-f6b329cbae3an%40googlegroups.com.