On 2/20/2019 9:20 AM, Radosław Korzeniewski wrote:
Hello,

śr., 20 lut 2019 o 13:29 Josh Fisher <jfis...@pvct.com <mailto:jfis...@pvct.com>> napisał(a):

    Note that posix_fadvise() only affects caching and read-ahead at
    the OS level. While the use of posix_fadvise() may indeed improve
    i/o performance for particular use cases, it is not parallelism
    and does not cause multiple user-space threads to be executed in
    parallel. I believe that Kern is referring to a multi-threaded
    approach in the bacula-fd, where multiple threads are executing in
    parallel to read and process files.

    Also, I believe that bacula-fd already does make use of
    posix_fadvise().

Yes, I mentioned about it in my previous email.

    I would think that a reader-writer approach would be possible. A
    single writer thread would perform all i/o with the SD while
    multiple reader threads would read and process single files at a
    time. A single management thread would manage the list of files to
    be backed up and spawn reader threads to process them. This could
    improve FD performance, particularly when compression and/or
    encryption is being used.

This topic has a lot of branches and detail levels causing a high level of misunderstanding, i.e.
- concurrent data scan (finding what to backup)
- concurrent data read at directory (or filesystem) level
- concurrent data read at file level
- concurrent data read at block level
- concurrent data processing (i.e. compression, see *1 below)
- asynchronous IO for data read (single thread)
- multiple network streams to single storage
- single network stream to multiple storages = multiple network streams
- multiple network streams to multiple storages
- support for high latency networks - single thread
- support for high latency networks - multiple threads
- automatic concurrency scaling (i.e. by a number of available cpu or system utilization)
- manual concurrency scaling


Yes. It is a complex topic, but can be implemented in a modular way to divide and conquer.

1. A management thread that:
        - Spawns a "writer" thread to handle all i/o with the SD or SDs
        - Performs data scan (finding what to backup)
        - Spawns a pool of "reader" threads (size of thread pool limits concurrency)
        - Assigns each file to be backed up to a reader thread

2. Reader threads that:
        - Are given a file to process by the management thread
        - Read the file and perform any compression / encryption
        - Establish connection to writer thread
                - writer thread assigns a queue to each reader thread to hold pointers to data blocks         - Send data blocks to the writer thread by pushing pointers onto the queue
        - Disconnect from writer thread and exit (return to thread pool)

2. A writer thread that:
        - Connects to SD or SDs and handles Bacula i/o protocol
        - Wait for connections from reader threads
        - Assign each reader thread a new data block queue
                - This is a queue of pointers, so can be a lock-free queue on many (most?) architectures
        - Send each queue's data to the SD serially if FIFO fashion
        - Limit the size of data queues, blocking reader threads that hit max queue size until space is available
                - Blocked threads are prioritized by send order position
        - Manage send order
                - The data queues of reader threads that have completed are moved up in the send order

Each of these three thread types can be sequential at first to simplify things. For example, the data scan in the management thread can utilize the existing single-threaded data scan code. At a later date, the management thread can spawn its own threads to parallelize the data scan, the writer thread can utilize the current streaming code, etc. Most importantly, the reader thread can utilize existing single-threaded code for reading, compressing and encrypting.

Just a rough outline, but this simplified approach at least gets things going with multiple files being read, compressed, and encrypted concurrently. The SD will see each file's data coming sequentially as it does now, so no changes are needed to the SD or Dir.

Cheers,
Josh Fisher
jfis...@jaybus.com


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to