Note that posix_fadvise() only affects caching and read-ahead at the OS
level. While the use of posix_fadvise() may indeed improve i/o
performance for particular use cases, it is not parallelism and does not
cause multiple user-space threads to be executed in parallel. I believe
that Kern is referring to a multi-threaded approach in the bacula-fd,
where multiple threads are executing in parallel to read and process files.
Also, I believe that bacula-fd already does make use of posix_fadvise().
I would think that a reader-writer approach would be possible. A single
writer thread would perform all i/o with the SD while multiple reader
threads would read and process single files at a time. A single
management thread would manage the list of files to be backed up and
spawn reader threads to process them. This could improve FD performance,
particularly when compression and/or encryption is being used.
I am not sure this approach is always a good thing. It depends on the
client hardware. When backing up weak clients using compression or
encryption, it would bring them to their knees, although a mechanism to
limit the number of reader threads that may be spawned would fix that.
Also, with weak clients, the real problem is slow disks on the clients,
and no amount of parallelism will fix that.
On 2/19/2019 2:43 PM, jes...@krogh.cc wrote:
Thanks for you comment Kern.
Hello,
This is a problem I have been considering for some time. The problem is
that current architecture of Bacula just does not properly handle
multi-treading the FD. Yes, you can do it by running two Jobs, but if
you are not 100% on top of the design limitations of Bacula, it is
unlikely that restores will work. Perhaps you have some clever way
around this, so at some point in the future, I would like to look at
this with you. In between time, perhaps you can describe more in detail
what you propose.
We'll I actually think this is a "read problem" only .. getting data off
the devices. The need for restore parallel streams is a very different
use-case it attack. The tapedevices can stream an infinite stream of very
small files very quickly and if you have a raid with battery-backed raid
controller the disk-system can also absorb without much parallism.
It is on the read side the challenge gets in .. this test is done on
a local spinning disk with xfs ~8ms seek time. using CephFS (which
we do in prodution) file access latency can be way higher on harddisk
storage.
pseudocode:
for f in dir then:
posix_fadvise(f)
fi
for f in dir then:
read_file(f)
fi
Test C-code attached .. not pretty by demonstrates the approach to get
the OS to do the parallism.
Test script:
#!/bin/bash
for bs in 4096 8192 16384 32768 65536 131072; do
for i in $(seq 1 10000); do
dd if=/dev/zero of=$i.file bs=$bs count=1 status=none
done
echo 3 > /proc/sys/vm/drop_caches
echo "With fadvise blocksize $bs"
time ~jk/test f > /dev/null
echo 3 > /proc/sys/vm/drop_caches
echo "Without fadvise blocksize $bs"
time ~jk/test > /dev/null
done
With fadvise blocksize 4096
Issuing fadvise
real 0m0.419s
user 0m0.062s
sys 0m0.250s
Without fadvise blocksize 4096
real 0m0.554s
user 0m0.034s
sys 0m0.184s
With fadvise blocksize 8192
Issuing fadvise
real 0m0.443s
user 0m0.068s
sys 0m0.269s
Without fadvise blocksize 8192
real 0m0.613s
user 0m0.044s
sys 0m0.204s
With fadvise blocksize 16384
Issuing fadvise
real 0m0.394s
user 0m0.056s
sys 0m0.328s
Without fadvise blocksize 16384
real 0m0.727s
user 0m0.035s
sys 0m0.268s
With fadvise blocksize 32768
Issuing fadvise
real 0m0.596s
user 0m0.080s
sys 0m0.395s
Without fadvise blocksize 32768
real 0m1.210s
user 0m0.060s
sys 0m0.398s
With fadvise blocksize 65536
Issuing fadvise
real 0m0.897s
user 0m0.129s
sys 0m0.491s
Without fadvise blocksize 65536
real 0m1.788s
user 0m0.119s
sys 0m0.653s
With fadvise blocksize 131072
Issuing fadvise
real 0m19.406s
user 0m0.206s
sys 0m0.738s
Without fadvise blocksize 131072
real 0m35.755s
user 0m0.269s
sys 0m1.058s
Thus depending on file-size there is approximately a 2x read-side
improvement on this naive solution.
I'm not a C-coder (as the attached program demonstrates :-) - but
limiting the problem to "read-size/single job/single-stream" parallism
simplifies the solution and changes to the codebase. It is only the
callback stuff that I dont understand. I was thinking about just
overloading breaddir to have an internal buffer of the next X (1000 files
< 512KB) that it would advance the fadvise call on, but that would
also advise the files that would be skipped by an incremntal backup.
Hope above both explains background, concept and proves the benefits.
So the "clever way" for restores would be to ignore it, as it really isn't
that relevant to this problem. Thats
at least what I see from real-world production backup and restore scenarios.
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users