On 8/6/2014 1:52 AM, Kern Sibbald
wrote:
On 08/04/2014 06:43 PM, Josh Fisher
wrote:
...
Have you set PreferMountedVolumes=no in the Job resource in
bacula-dir.conf? If 3 jobs start and want to write to volumes in
the same pool, then all three can be assigned the same volume.
In fact, if PreferMountedVolumes=yes, (the default), then all
three WILL be assigned the same volume unless the pool restricts
the max number of jobs that the volume may contain. However,
your device (drive) restricts the max concurrent jobs to 2.
Therefore one of those three jobs will not be able to select the
drive where the volume is mounted and will be forced to select
another unused drive. That third job will nevertheless select
the same volume as the other two and attempt to move the volume
from the drive it is in into the drive that it has been assigned
to. The configuration has a built-in race condition.
This is the first time that I have heard this explained so
clearly. I am going to try to duplicate this problem now that you
have so clearly explained it. By the way, I am not really sure I
would classify this as a race condition, because theoretically the
SD is not blocked, the third job just waits until the Volume is
free (at least that is what I programmed). However, this is
clearly very inefficient.
I agree. It is not a race condition in the code at all. Nothing gets
stuck. It is really a misconfiguration, though the config file is
syntactically correct. I'm not sure what to call that. I suppose I
should have said the configuration has a built-in "resource
contention problem", rather than race condition. Sorry for the
confusion.
I would like to fix this, but one must keep in mind one important
difficulty with Bacula. The SD knows what is going on with
Volumes, but the Dir does not, and it is the Dir that proposes
Volumes to the SD. Currently there is no good atomic way to pass
the information in the SD to the Dir so that it can make better
decisions.
So, with the (current) restraint that the solution must involve
changing only the SD algorithm, how could one prevent this from
happening? I have some ideas, but wonder what you think.
I think that it in fact MUST be changed only in the SD. The issue is
that the volume selection for a job needs to be atomic. Whether the
volume info is acquired from Dir, and array in SD, or anywhere else,
SD must access it in a critical section in
order to serialize volume selection. I believe that ANYTHING that
changes the status of a volume or device should be handled in SD as
an atomic operation. Consider a single mutex that must be held in
order to make any changes to either a volume or a device. The status
of devices and volumes is transmitted back to Dir as part of the
mutex release. Dir then always has accurate info, because only one
job at a time can change anything. (I also consider Dir commands to
the SD to be "jobs" in this context).
I guess I am of the belief that the current per-device locking is
too fine grained. Due to volume selection, one device can affect
another, even if indirectly as in the swapping required when the
same volume is needed on two devices. A global lock simplifies
concurrency and imho makes the whole system more robust. The biggest
con is that multiple devices cannot mount/umount volumes at the same
time. As far as I know, most tape robots cannot load/unload multiple
drives simultaneously anyway, and for disk the mount/umount is only
a few ms at most, so I don't view that as a problem.
I think concurrent programming is just hard, period. :) Therefore I
prefer simplifying the serialization over squeezing out the utmost
performance. And I think a global acquisition lock in SD is the way
to do that.
Setting PreferMountedVolumes=no causes the three jobs to select
a drive that is NOT already mounted with a volume from the pool.
This allows jobs writing to the same pool to select different
volumes from the pool, rather than all selecting the same next
available volume. This has its own caveats. It doesn't
necessarily prevent two jobs from selecting the same volume in
some cases, meaning that they will want to swap the volume back
and forth between drives, which is another type of race
condition. I have used this method successfully for a pool
containing full backups only by setting PreferMountedVolumes=no
in the job resource and setting MaximumVolumeJobs=1 in the pool
resource. Since Bacula selects the volume for a job in an atomic
manner, this forces an exclusive set of volumes for each job,
thus preventing the race condition. This means that concurrency
is limited only by the number of drives, but at the "expense" of
creating a greater number of smaller volume files. I quote
"expense" because on a disk vchanger it isn't usually a big
issue to have more volume files. Doing this with a tape
autochanger would use a lot more tapes and be truly more
expensive. Of course unlimited concurrency is theoretical, since
the hardware limits the USEFUL concurrency.
I really do not like the PreferMountedVolumes = No option (I have
probably said this many times), but I find your use of it very
well explained and very interesting.
Best regards,
Kern
|