[bareos-users] Re: bareos VirtualFull backup deadlock

Jon Schewe Mon, 21 Apr 2025 10:06:28 -0700

I've worked around this for now, but it appears there is a bug in the just 
in time device scheduling. 2 jobs can grab the read devices and then there 
my be no write devices and bareos just hangs.


I learned that changing the maximum concurrent jobs parameters do not 
appear to take effect with a "reload"; one must restart the director.

I ended creating separate storage devices for onsite and offsite backups to 
keep the offsite backups from fighting for a storage resource. 

On Thursday, April 17, 2025 at 12:48:24 PM UTC-5 Jon Schewe wrote:

> I have some backups to an LTO-8 pool
> I have some backups to an LTO-9 pool (that I'm migrating to from LTO-8)
> I have 2 LTO-8 drives in a changer
> I have 2 LTO-9 drives in a changer
>
> I'm doing VirtualFull backups with a destination of an offsite LTO-9 pool.
>
> I'm finding that bareos is starting 2 VirtualFull backups at the same time 
> and appears to be deadlocked waiting for drives. I expected bareos to 
> reserve a drive for reading and for writing and then go and block other 
> jobs.
>
> Things that I've tried change to reduce this down to a single job running 
> at a time:
> - Director -> Director -> Maximum Concurrent Jobs = 1
> - Director -> Client (bareos-fd) -> Maximum Concurrent Jobs = 1
> - Director -> Client (client1-fd) -> Maximum Concurrent Jobs = 1
> - Director -> Storage (LTO-8) -> Maximum Concurrent Jobs = 1
> - Director -> Storage (LTO-9) -> Maximum Concurrent Jobs = 1
>
> I'm doing a reload after making each change and I have not undone any of 
> the changes. 
> After I reload I cancel one of the running jobs and add it back to the 
> queue so that it gets picked up later.
> I'm still seeing bareos execute 2 jobs and neither is making any progress.
>
> Output of one of the jobs
>
>  2025-04-17 13:06:02 bareos-dir JobId 20624: Version: 
> 24.0.3~pre0.54685a85d (27 March 2025) Red Hat Enterprise Linux release 9.5 
> (Plow)
>  2025-04-17 13:06:02 bareos-dir JobId 20624: Start Virtual Backup JobId 
> 20624, Job=client1-job1-offsite.2025-04-12_00.01.01_17
>  2025-04-17 13:06:02 bareos-dir JobId 20624: Bootstrap records written to 
> /var/lib/bareos/bareos-dir.restore.100.bsr
>  2025-04-17 13:06:02 bareos-dir JobId 20624: Consolidating JobIds 
> 20078,20239,20393,20543 containing 49 files
>  2025-04-17 13:06:02 bareos-dir JobId 20624: Connected Storage daemon at 
> bareos.mgmt.bbn.com:9103, encryption: TLS_AES_256_GCM_SHA384 TLSv1.3
>  2025-04-17 13:06:02 bareos-dir JobId 20624:  Encryption: 
> TLS_AES_256_GCM_SHA384 TLSv1.3
>  2025-04-17 13:06:03 bareos-dir JobId 20624: Using Device "LTO-9_drive1" 
> to read.
>  2025-04-17 13:06:03 bareos-sd JobId 20624: Using just in time reservation 
> for job 20624
>  2025-04-17 13:06:03 bareos-dir JobId 20624: Using Device "JustInTime 
> Device" to write.
>
> LTO-9 storage status
> JobId=20624 Level=Virtual Full Type=Backup Name=client1-job1-offsite 
> Status=Created                                                             
>                   
> Reading: Volume=""                                                         
>                                                                             
>                     
>     pool="onsite-LTO-9" device="LTO-9_drive1" 
> (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst)       
> Writing: Volume=""                                                         
>                                                                             
>                     
>     pool="offsite-LTO-9" device="LTO-9_drive1" 
> (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst)                               
>                                                  
>     spooling=0 despooling=0 despool_wait=0                                 
>           
>     Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0                         
>                                                                             
>                     
>     FDSocket closed                                                       
>            
>                                                                           
>            
> JobId=20647 Level=Virtual Full Type=Backup Name=client1-job2-offsite 
> Status=Created
> Reading: Volume=""                                                         
>           
>     pool="onsite-LTO-9" device="LTO-9_drive0" 
> (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst)
> Writing: Volume=""                                                         
>           
>     pool="offsite-LTO-9" device="LTO-9_drive1" 
> (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst)
>     spooling=0 despooling=0 despool_wait=0                                 
>                                                                             
>                     
>     Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0                         
>                                                                             
>                     
>     FDSocket closed                                                       
>                                                                             
>                      
>                                                                           
>                                                                             
>                      
> ====                                                                       
>           
>                                                                           
>            
> Jobs waiting to reserve a drive:                                           
>           
>    3603 JobId=20624 device "LTO-9_drive0" 
> (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst) is busy reading.
>    3609 JobId=20624 Max concurrent jobs exceeded on drive "LTO-9_drive1" 
> (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst).
>    3603 JobId=20647 device "LTO-9_drive0" 
> (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst) is busy reading.
>    3609 JobId=20647 Max concurrent jobs exceeded on drive "LTO-9_drive1" 
> (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst).    
>
> ...
> Used Volume status:
> ANJ645L9 on device "LTO-9_drive0" 
> (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst)
>     Reader=1 writers=0 reserves=1 volinuse=0
> ANJ646L9 on device "LTO-9_drive1" 
> (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst)
>     Reader=0 writers=0 reserves=1 volinuse=0
> Read Volume: 003048L8 no device. volinuse= 0
> Read Volume: 003041L8 no device. volinuse= 0
> Read Volume: 003048L8 no device. volinuse= 0
> Read Volume: ANJ621L9 no device. volinuse= 0
> Read Volume: ANJ651L9 no device. volinuse= 0
>
>
> The status of the LTO-8 storage only shows me the LTO-9 information, 
> nothing about LTO-8 drives in use.
>
> How do I get bareos unstuck?
>
> Jon
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bareos-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/bareos-users/b5d50786-3e15-431e-9757-f6b329cbae3an%40googlegroups.com.

[bareos-users] Re: bareos VirtualFull backup deadlock

Reply via email to