Hi Ben,
To start with I would watch the server using iostat and top to make sure
you aren't running out of CPU and that you really aren't maxing out your
disks during your backup. In particular pay attention to the '%util' -
I like to run 'iostat -kx 2' which will show the stats every 2 seconds.
Of course the usefulness of that will depend on your disk configuration
- if you have a RAID card presenting a single LUN to the server then the
numbers may be a bit faulty, but it should still give an indication of
what is going on. I see you are allowing 3 concurrent jobs on your tape
drives - if all 3 of these are spooling from the same set of disks at
once you may have some impact on performance since you are now requiring
a lot of seek operations.
There are also some settings around buffers for bacula that I've had to
tweak in the past to get ideal performance. Check "Maximum Network
Buffer Size" in your configs.
Finally there are some OS-level settings for the 'st' driver (I'm
assuming you are on Linux). With my LTO3 drives I need to add this to
the kernel command line:
st=buffer_kbs:256,max_buffers:32
LTO6 may need similar tweaks. Without the st driver buffer size and
number increased I had a hard time keeping an LTO3 drive running at full
speed (80MB/sec).
Bryn
On 2014-11-05 03:48 AM, Roberts, Ben wrote:
Hi all,
I'd like to try and make some speed improvements to my Bacula setup
(5.2.13, Solaris11). I have data (and attribute) spooling enabled
using a pool of 46x 1TB directly-attached SAS disks dedicated to this
purpose. Data is being despooled to 2x directly-attached SAS LTO6
drives at around 100mB/sec each. I think I should be able to get
closer to the ~160mB/s maximum uncompressed thoughput the drives and
tape media support (ref:
http://docs.oracle.com/cd/E38452_01/en/LTO6_Vol4_E1/LTO6_Vol4_E1.pdf)
<http://docs.oracle.com/cd/E38452_01/en/LTO6_Vol4_E1/LTO6_Vol4_E1.pdf%29>.
I've just done a speed test and can read from the spool array at a
sustained 300mB/sec even while other jobs are running, so I'm sure
there's no bottleneck at the disk layer. My suspicion is that the
bottleneck is at the application layer, probably due to the way I have
Bacula configured.
Having read through Bareos' tuning paper
(http://www.bareos.org/en/Whitepapers/articles/Speed_Tuning_of_Tape_Drives.html),
I've updated the max file size from 1->50GB which increased the
throughput from ~75 to ~100mB/sec. I believe I need to look at tuning
the block size to gain the last bit of improvement.
Is it still the case in Bacula that changing the Maximum Block Size
renders previously used/labelled tapes to become unreadable? I'm up to
almost 1,000 tape media already written, so making these unusable for
restores without restarting the SD to change configs would be less
than ideal. I see Bareos is touting a feature to make changes to block
size at the pool level rather than the storage level and so this
problem can be avoided by moving newer backups to a different pool
while still keeping older backups readable. I haven't seen any
reference to this in the Bacula manual; is it something that's already
supported or in the plans for a future version?
For reference, this is one of the the relevant drive definitions I'm
using, just in case there's something else that would help which I
might have missed:
Device {
Name = drive-1-tapestore1
Archive Device = /dev/rmt/1mbn
Device Type = Tape
Media Type = LTO6
AutoChanger = yes
Removable media = yes
Random access = no
Requires Mount = no
Drive Index = 1
Maximum Concurrent Jobs = 3
Maximum Spool Size = 1024G
Maximum File Size = 50G
Autoselect = yes
}
Regards,
Ben Roberts
------------------------------------------------------------------------
------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users