On 2/20/2019 9:20 AM, Radosław Korzeniewski wrote:
Hello,
śr., 20 lut 2019 o 13:29 Josh Fisher <jfis...@pvct.com
<mailto:jfis...@pvct.com>> napisał(a):
Note that posix_fadvise() only affects caching and read-ahead at
the OS level. While the use of posix_fadvise() may indeed improve
i/o performance for particular use cases, it is not parallelism
and does not cause multiple user-space threads to be executed in
parallel. I believe that Kern is referring to a multi-threaded
approach in the bacula-fd, where multiple threads are executing in
parallel to read and process files.
Also, I believe that bacula-fd already does make use of
posix_fadvise().
Yes, I mentioned about it in my previous email.
I would think that a reader-writer approach would be possible. A
single writer thread would perform all i/o with the SD while
multiple reader threads would read and process single files at a
time. A single management thread would manage the list of files to
be backed up and spawn reader threads to process them. This could
improve FD performance, particularly when compression and/or
encryption is being used.
This topic has a lot of branches and detail levels causing a high
level of misunderstanding, i.e.
- concurrent data scan (finding what to backup)
- concurrent data read at directory (or filesystem) level
- concurrent data read at file level
- concurrent data read at block level
- concurrent data processing (i.e. compression, see *1 below)
- asynchronous IO for data read (single thread)
- multiple network streams to single storage
- single network stream to multiple storages = multiple network streams
- multiple network streams to multiple storages
- support for high latency networks - single thread
- support for high latency networks - multiple threads
- automatic concurrency scaling (i.e. by a number of available cpu or
system utilization)
- manual concurrency scaling
Yes. It is a complex topic, but can be implemented in a modular way to
divide and conquer.
1. A management thread that:
- Spawns a "writer" thread to handle all i/o with the SD or SDs
- Performs data scan (finding what to backup)
- Spawns a pool of "reader" threads (size of thread pool limits
concurrency)
- Assigns each file to be backed up to a reader thread
2. Reader threads that:
- Are given a file to process by the management thread
- Read the file and perform any compression / encryption
- Establish connection to writer thread
- writer thread assigns a queue to each reader thread
to hold pointers to data blocks
- Send data blocks to the writer thread by pushing pointers
onto the queue
- Disconnect from writer thread and exit (return to thread pool)
2. A writer thread that:
- Connects to SD or SDs and handles Bacula i/o protocol
- Wait for connections from reader threads
- Assign each reader thread a new data block queue
- This is a queue of pointers, so can be a lock-free
queue on many (most?) architectures
- Send each queue's data to the SD serially if FIFO fashion
- Limit the size of data queues, blocking reader threads that
hit max queue size until space is available
- Blocked threads are prioritized by send order position
- Manage send order
- The data queues of reader threads that have completed
are moved up in the send order
Each of these three thread types can be sequential at first to simplify
things. For example, the data scan in the management thread can utilize
the existing single-threaded data scan code. At a later date, the
management thread can spawn its own threads to parallelize the data
scan, the writer thread can utilize the current streaming code, etc.
Most importantly, the reader thread can utilize existing single-threaded
code for reading, compressing and encrypting.
Just a rough outline, but this simplified approach at least gets things
going with multiple files being read, compressed, and encrypted
concurrently. The SD will see each file's data coming sequentially as it
does now, so no changes are needed to the SD or Dir.
Cheers,
Josh Fisher
jfis...@jaybus.com
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users