> I have a different understanding of the function.  there is no need
> to read "data off clients as fast as possible" - if your clients are
> fast, they have no problems to provide the data at lower rates,
> too.  It's the other way round; if your clients are slow so they
> cannot feed the data fast enough to keep the tape streaming.  This
> often happens when you run for example incremental backups over a
> big data set (say, millions of files) with only little changes.

It may also help in this case, but so would buffering instead of
spooling.  I was under the impression that in the case of slow clients,
Bacula is designed to read from many clients at the same time, so that
it can get the throughput required for the tape without spooling.

> In this case the client gets the time it needs to traverse the fole
> directory tree, and when done, you have all date to be backed up
> collected in the spool area which is then fast enough to kepp the
> tape happily streaming.

This is true, but the drawback of the spool file is that you need
enough disk space to hold a full tape's worth of data for it to
perform optimally.  If the spool file is not an exact multiple of the
tape size, performance will drop.

> > Consequently spooling works best when the spool file is large
> > enough to contain one whole tape's worth of data, and you have
> > enough clients backing up that there is always a complete spool
> > file ready to write out to tape.  
> This is not necessary.  Or only one possible special case.

It's not necessary, but if you do not do this, performance will suffer
and your tape will shoe shine.

> > Anything less than this and spooling will slow things down.  
> This is not correct, if you consider incremental and differential
> backups.

I am only referring to getting data from the spool file onto tape.
Let's say you have a 100GB spool file and you are writing to an 800GB
tape.  The process will go like this:

 * Read 100GB from client, tape is idle
 * Write 100GB to tape, pause tape
 * Read next 100GB from client while tape is paused
 * Start up tape again and write next 100GB

Thus even if your clients can keep up with the tape 100% of the time,
you will still introduce extra shoe shining if your spool file is not
exactly one tape in size.

(If your spool file is larger than one tape, then you will fill up one
tape in one continuous operation which is perfect, but then the second
tape will pause once the end of the spool file has been reached which
is not ideal either.)

So you can see that using a spool file is typically worse for a tape
drive, as it will almost always introduce additional stop/start cycles
(shoe shining) which would not be there otherwise, unless you have a
very slow client.

This is why in my opinion buffering is better, because a small FIFO
buffer can read data from clients *while* writing to tape, so there is
no extra shoe shining.  A buffer will also not harm performance if it
is not required, however using a spool file when one is not needed will
make performance worse.

With my own experience writing data to tape using mbuffer and tar, a
buffer of 4GB was enough to prevent all shoe shining, and it did not
slow down the process at all.  However with Bacula, my spool file must
be 800GB to achieve the same result, and even this makes the process
take much longer because the tape is idle while the spool file is
filling up the first time.

I don't have 800GB available for the spool file either, which means my
choices are:

  1.  Use a smaller spool file and live with tape shoe shine.

  2.  Don't use a spool file at all and live with tape shoe shine
      caused by slow clients.

  3.  Buy more disk I can't use for real storage because it must be
      reserved for Bacula scratch space, and live with shoe shine as
      well because tapes are never exactly 800GB.

  4.  Implement buffering support in Bacula so that I can eliminate
      shoe shining and speed up my backups, without buying new hardware.

I definitely favour #4 because having support for large tape buffers in
Bacula would provide some big performance benefits.


Bacula-users mailing list

Reply via email to