This issue is weird, and I wonder if anyone else has seen anything like
this.
We are using bacula with file storage, and the volume labelling comes
from a variable expansion:
LabelFormat =
"${JobId}-${Client}-${Level}-${Pool}-${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}"
After upgrading the director node from bacula 9.4.2 (Ubuntu 20.04) to
bacula 13.0.4 (Ubuntu 24.04), we see strange behaviour where job B is
creating a volume name using the JobID and client name belonging to job
A. This in turn confuses job A, when it finds the volume it's trying to
create already exists. It keeps retrying until the day rolls over, and
it gets a unique name again.
Let me illustrate with one particular example. Job 92925 was backing up
director.nsrc.org, and job 92926 was backing up drupal.nsrc.org.
bacula=# select jobid,job,name,clientid from job where jobid in
(92925,92926);
jobid | job | name | clientid
-------+------------------------------------------+-------------------+----------
92926 | drupal.nsrc.org.2025-01-25_21.00.00_35 | drupal.nsrc.org
| 40
92925 | director.nsrc.org.2025-01-25_21.00.00_34 | director.nsrc.org
| 37
(2 rows)
bacula=# select * from client where clientid in (37,40);
clientid | name | uname |
autoprune | fileretention | jobretention
----------+----------------------+--------------------------------------------------+-----------+---------------+--------------
37 | director.nsrc.org-fd | 9.4.2 (04Feb19)
x86_64-pc-linux-gnu,ubuntu,20.04 | 1 | 31536000 | 31536000
40 | drupal.nsrc.org-fd | 9.6.7 (10Dec20)
x86_64-pc-linux-gnu,ubuntu,22.04 | 1 | 31536000 | 31536000
(2 rows)
OK so far. Now look at which media those jobs were using:
bacula=# select * from jobmedia where jobid=92925;
jobmediaid | jobid | mediaid | firstindex | lastindex | startfile |
endfile | startblock | endblock | volindex
------------+-------+---------+------------+-----------+-----------+---------+------------+------------+----------
404997 | 92925 | 91486 | 1 | 5604 | 0 | 0
| 263 | 999936173 | 2
404998 | 92925 | 91486 | 5604 | 60865 | 0 | 0 |
999936174 | 1609426358 | 3
(2 rows)
bacula=# select * from jobmedia where jobid=92926;
jobmediaid | jobid | mediaid | firstindex | lastindex | startfile |
endfile | startblock | endblock | volindex
------------+-------+---------+------------+-----------+-----------+---------+------------+------------+----------
404586 | 92926 | 91455 | 1 | 54633 | 0 | 0
| 263 | 999936044 | 2
404587 | 92926 | 91455 | 54633 | 66615 | 0 | 0 |
999936045 | 1999871925 | 3
404588 | 92926 | 91455 | 66615 | 71853 | 0 | 0 |
1999871926 | 2999807889 | 4
404589 | 92926 | 91455 | 71853 | 109430 | 0 | 0 |
2999807890 | 3999743805 | 5
404590 | 92926 | 91455 | 109430 | 114941 | 0 | 1 |
3999743806 | 704712444 | 6
404591 | 92926 | 91455 | 114941 | 115392 | 1 | 1 |
704712445 | 1704648325 | 7
404592 | 92926 | 91455 | 115392 | 115865 | 1 | 1 |
1704648326 | 2389364108 | 8
(7 rows)
They are different media, as expected. But the *names* of those volumes
are both derived from "director.nsrc.org" and named as if they are from
job 92925:
bacula=# select mediaid,volumename,voljobs from media where mediaid in
(91455,91486);
mediaid | volumename | voljobs
---------+-------------------------------------------------+---------
91455 | 92925-director.nsrc.org-fd-Full-Full-2025-01-26 | 1
91486 | 92925-director.nsrc.org-fd-Full-Full-2025-01-27 | 1
(2 rows)
This matches what's on the filesystem: there is no volume with name
starting 92926.
# ls -1 /backup/bacula/9292[56]*
/backup/bacula/92925-director.nsrc.org-fd-Full-Full-2025-01-26
/backup/bacula/92925-director.nsrc.org-fd-Full-Full-2025-01-27
#
Here are the logs:
--------
# grep '9292[56]' /var/log/bacula/bacula.log
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Start Backup JobId 92926,
Job=drupal.nsrc.org.2025-01-25_21.00.00_35
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Error: bsockcore.c:400
Source address bind error. proto=2. ERR=Invalid argument
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Connected to Storage
"FileStorage0" at store.nsrc.org:9103 with TLS
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Using Device "FileDev0" to
write.
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Storage "FileStorage0" was
selected out of group of 2 available storages. StorageGroupPolicy
"ListedOrder" used
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Error: bsockcore.c:400
Source address bind error. proto=2. ERR=Invalid argument
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Connected to Client
"drupal.nsrc.org-fd" at drupal.nsrc.org:9102 without encryption
*26-Jan 02:05 store.nsrc.org-sd JobId 92926: Labeled new Volume
"92925-director.nsrc.org-fd-Full-Full-2025-01-26" on File device
"FileDev0" (/backup/bacula).*
26-Jan 02:05 store.nsrc.org-sd JobId 92926: Wrote label to prelabeled
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" on File device
"FileDev0" (/backup/bacula)
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Volume used once. Marking
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" as Used.
26-Jan 02:20 store.nsrc.org-sd JobId 92926: Elapsed time=00:15:12,
Transfer rate=7.316 M Bytes/second
26-Jan 02:20 store.nsrc.org-sd JobId 92926: Sending spooled attrs to the
Director. Despooling 29,199,872 bytes ...
26-Jan 02:20 store.nsrc.org-dir JobId 92926: Bacula store.nsrc.org-dir
13.0.4 (12Feb24):
JobId: 92926
Volume name(s): 92925-director.nsrc.org-fd-Full-Full-2025-01-26
26-Jan 02:20 store.nsrc.org-dir JobId 92926: Begin pruning Jobs older
than 1 year .
26-Jan 02:20 store.nsrc.org-dir JobId 92926: No Jobs found to prune.
26-Jan 02:20 store.nsrc.org-dir JobId 92926: Begin pruning Files.
26-Jan 02:20 store.nsrc.org-dir JobId 92926: No Files found to prune.
26-Jan 02:20 store.nsrc.org-dir JobId 92926: End auto prune.
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Start Backup JobId 92925,
Job=director.nsrc.org.2025-01-25_21.00.00_34
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: bsockcore.c:400
Source address bind error. proto=2. ERR=Invalid argument
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Connected to Storage
"FileStorage0" at store.nsrc.org:9103 with TLS
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Created new
Volume="92925-director.nsrc.org-fd-Full-Full-2025-01-26", Pool="Full",
MediaType="File0" in catalog.
*26-Jan 02:04 store.nsrc.org-dir JobId 92925: Storage daemon
"FileStorage0" didn't accept Device "FileDev0" because: 3924 Device
"FileDev0" not in SD Device resources or no matching Media Type or is
disabled.*
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: bsockcore.c:400
Source address bind error. proto=2. ERR=Invalid argument
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Connected to Storage
"FileStorage1" at store.nsrc.org:9103 with TLS
*26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.*
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Using Device "FileDev1" to
write.
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Storage "FileStorage1" was
selected out of group of 2 available storages. StorageGroupPolicy
"ListedOrder" used
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: bsockcore.c:400
Source address bind error. proto=2. ERR=Invalid argument
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Connected to Client
"director.nsrc.org-fd" at director.nsrc.org:9102 without encryption
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.
26-Jan 02:04 store.nsrc.org-sd JobId 92925: Job
director.nsrc.org.2025-01-25_21.00.00_34 is waiting. Cannot find any
appendable volumes.
26-Jan 02:09 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.
26-Jan 02:14 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.
<<snip lots>>
26-Jan 23:54 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.
26-Jan 23:59 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.
27-Jan 00:04 store.nsrc.org-dir JobId 92925: Created new
Volume="92925-director.nsrc.org-fd-Full-Full-2025-01-27", Pool="Full",
MediaType="File1" in catalog.
27-Jan 00:04 store.nsrc.org-sd JobId 92925: Labeled new Volume
"92925-director.nsrc.org-fd-Full-Full-2025-01-27" on File device
"FileDev1" (/backup/bacula).
27-Jan 00:04 store.nsrc.org-sd JobId 92925: Wrote label to prelabeled
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-27" on File device
"FileDev1" (/backup/bacula)
27-Jan 00:04 store.nsrc.org-dir JobId 92925: Volume used once. Marking
Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-27" as Used.
27-Jan 00:10 store.nsrc.org-sd JobId 92925: Elapsed time=00:06:00,
Transfer rate=4.460 M Bytes/second
27-Jan 00:10 store.nsrc.org-sd JobId 92925: Sending spooled attrs to the
Director. Despooling 14,038,445 bytes ...
27-Jan 00:10 store.nsrc.org-dir JobId 92925: Bacula store.nsrc.org-dir
13.0.4 (12Feb24):
JobId: 92925
Volume name(s): 92925-director.nsrc.org-fd-Full-Full-2025-01-27
27-Jan 00:10 store.nsrc.org-dir JobId 92925: Begin pruning Jobs older
than 1 year .
27-Jan 00:10 store.nsrc.org-dir JobId 92925: No Jobs found to prune.
27-Jan 00:10 store.nsrc.org-dir JobId 92925: Begin pruning Files.
27-Jan 00:10 store.nsrc.org-dir JobId 92925: No Files found to prune.
27-Jan 00:10 store.nsrc.org-dir JobId 92925: End auto prune.
--------
What appears to be happening is:
- Job 92926 creates the volume
"92925-director.nsrc.org-fd-Full-Full-2025-01-26" - even though its job
ID is 92925, and the client is drupal.nsrc.org! - and successfully uses it.
- Job 92925 starts, and tries to create volume
"92925-director.nsrc.org-fd-Full-Full-2025-01-26" - which is the correct
name. But it finds that the volume already exists. It backs off and
keeps retrying, hundreds of times
- Eventually it goes past midnight, and it tries to create volume
"92925-director.nsrc.org-fd-Full-Full-2025-01-27", which succeeds, and
the backup completes.
The "Source address bind error" comes from DirSourceAddress and I've
separately raised this as an issue at
https://gitlab.bacula.org/bacula-community-edition/bacula-community/-/issues/2742
I don't *think* it's related, but I'll run without DirSourceAddress
tonight just to be sure.
I don't know what the message
*Storage daemon "FileStorage0" didn't accept Device "FileDev0" because:
3924 Device "FileDev0" not in SD Device resources or no matching Media
Type or is disabled.*
is telling me, and whether this is part of the problem - this message
didn't appear prior to the upgrade.
I will provide the config files in a following mail, as my first attempt
breached the Sourceforge message size limit.
Thanks,
Brian Candler.
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users