Note that posix_fadvise() only affects caching and read-ahead at the OS level. While the use of posix_fadvise() may indeed improve i/o performance for particular use cases, it is not parallelism and does not cause multiple user-space threads to be executed in parallel. I believe that Kern is referring to a multi-threaded approach in the bacula-fd, where multiple threads are executing in parallel to read and process files.

Also, I believe that bacula-fd already does make use of posix_fadvise().

I would think that a reader-writer approach would be possible. A single writer thread would perform all i/o with the SD while multiple reader threads would read and process single files at a time. A single management thread would manage the list of files to be backed up and spawn reader threads to process them. This could improve FD performance, particularly when compression and/or encryption is being used.

I am not sure this approach is always a good thing. It depends on the client hardware. When backing up weak clients using compression or encryption, it would bring them to their knees, although a mechanism to limit the number of reader threads that may be spawned would fix that. Also, with weak clients, the real problem is slow disks on the clients, and no amount of parallelism will fix that.


On 2/19/2019 2:43 PM, jes...@krogh.cc wrote:
Thanks for you comment Kern.

Hello,

This is a problem I have been considering for some time.  The problem is
that current architecture of Bacula just does not properly handle
multi-treading the FD.  Yes, you can do it by running two Jobs, but if
you are not 100% on top of the  design limitations of Bacula, it is
unlikely that restores will work.  Perhaps you have some clever way
around this, so at some point in the future, I would like to look at
this with you.  In between time, perhaps you can describe more in detail
what you propose.
We'll I actually think this is a "read problem" only .. getting data off
the devices. The need for restore parallel streams is a very different
use-case it attack. The tapedevices can stream an infinite stream of very
small files very quickly and if you have a raid with battery-backed raid
controller the disk-system can also absorb without much parallism.

It is on the read side the challenge gets in .. this test is done on
a local spinning disk with xfs ~8ms seek time. using CephFS (which
we do in prodution) file access latency can be way higher on harddisk
storage.

pseudocode:

for f in dir then:
    posix_fadvise(f)
fi

for f in dir then:
    read_file(f)
fi


Test C-code attached .. not pretty by demonstrates the approach to get
the OS to do the parallism.

Test script:

#!/bin/bash

for bs in 4096 8192 16384 32768 65536 131072; do
         for i in $(seq 1 10000); do
                 dd if=/dev/zero of=$i.file bs=$bs count=1 status=none
         done
         echo 3 > /proc/sys/vm/drop_caches
         echo "With fadvise blocksize $bs"
         time ~jk/test f > /dev/null
         echo 3 > /proc/sys/vm/drop_caches
         echo "Without fadvise blocksize $bs"
         time ~jk/test > /dev/null
done


With fadvise blocksize 4096
Issuing fadvise

real    0m0.419s
user    0m0.062s
sys     0m0.250s
Without fadvise blocksize 4096

real    0m0.554s
user    0m0.034s
sys     0m0.184s
With fadvise blocksize 8192
Issuing fadvise

real    0m0.443s
user    0m0.068s
sys     0m0.269s
Without fadvise blocksize 8192

real    0m0.613s
user    0m0.044s
sys     0m0.204s
With fadvise blocksize 16384
Issuing fadvise

real    0m0.394s
user    0m0.056s
sys     0m0.328s
Without fadvise blocksize 16384

real    0m0.727s
user    0m0.035s
sys     0m0.268s
With fadvise blocksize 32768
Issuing fadvise

real    0m0.596s
user    0m0.080s
sys     0m0.395s
Without fadvise blocksize 32768

real    0m1.210s
user    0m0.060s
sys     0m0.398s
With fadvise blocksize 65536
Issuing fadvise

real    0m0.897s
user    0m0.129s
sys     0m0.491s
Without fadvise blocksize 65536

real    0m1.788s
user    0m0.119s
sys     0m0.653s
With fadvise blocksize 131072
Issuing fadvise

real    0m19.406s
user    0m0.206s
sys     0m0.738s
Without fadvise blocksize 131072

real    0m35.755s
user    0m0.269s
sys     0m1.058s


Thus depending on file-size there is approximately a 2x read-side
improvement on this naive solution.


I'm not a C-coder (as the attached program demonstrates :-) - but
limiting the problem to "read-size/single job/single-stream" parallism
simplifies the solution and changes to the codebase. It is only the
callback stuff that I dont understand. I was thinking about just
overloading breaddir to have an internal buffer of the next X (1000 files
< 512KB) that it would advance the fadvise call on, but that would
also advise the files that would be skipped by an incremntal backup.

Hope above both explains background, concept and proves the benefits.

So the "clever way" for restores would be to ignore it, as it really isn't
that relevant to this problem. Thats
at least what I see from real-world production backup and restore scenarios.



_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to