by using async futures to load chunks and stream::buffer_unordered to buffer up to 16 of them, depending on write/load speed.
With this, we don't need to increase the number of threads in the runtime to trigger parallel reads and network traffic to us. This way it's only limited by CPU if decoding and/or decrypting is the bottle neck. I measured restoring a VM backup with about 30GiB data and fast storage over a local network link (from PBS VM to the host). Let it do multiple runs, but the variance was not that big, so here's some representative log output with various MAX_BUFFERED_FUTURES values. no MAX_BUFFERED_FUTURES: duration=43.18s, speed=758.82MB/s 4: duration=38.61s, speed=848.77MB/s 8: duration=33.79s, speed=969.85MB/s 16: duration=31.45s, speed=1042.06MB/s note that increasing the number has a diminishing returns after 10-12 on my system, but I guess it depends on the exact configuration. (For more than 16 I did not see any improvement, but this is probably just my setup). I saw an increase in CPU usage (from ~75% to ~100% of one core), which are very likely the additional chunks to be decoded. In general I'd like to limit the buffering somehow, but I don't think there is a good automatic metric we can use, and giving the admin a knob that is hard to explain what the actual ramifications about it are is also not good, so I settled for a value that showed improvement but does not seem too high. In any case, if the target and/or source storage is too slow, there will be back/forward pressure, and this change should only matter for storage systems where IO depth plays a role. This patch is loosely based on the patch from Adam Kalisz[0], but removes the need to increase the blocking threads and uses the (actually always used) underlying async implementation for reading remote chunks. 0: https://lore.proxmox.com/pve-devel/mailman.719.1751052794.395.pve-de...@lists.proxmox.com/ Signed-off-by: Dominik Csapak <d.csa...@proxmox.com> Based-on-patch-by: Adam Kalisz <adam.kal...@notnullmakers.com> --- @Adam could you please test this patch too to see if you still see the improvements you saw in your version? Also I sent it as RFC to discuss how we decide how many chunks we want to buffer/threads we want to allocate. This is a non-trivial topic, and as i wrote we don't have a real metric to decide upfront, but giving the admin knobs that are complicated is also not the best solution. My instinct would be to simply increase to 16 (as I have done here) and maybe expose this number in /etc/vzdump.conf or /etc/pve/datacenter.cfg Also I tried to make the writes multi-threaded too, but my QEMU-knowledge is not very deep for this kind of thing, and I wanted to get this version out there soon. (Increasing the write threads can still be done afterwards if this change is enough for now) I developed this patch on top of the 'stable-bookworm' branch, but it should apply cleanly on master as well. src/restore.rs | 57 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 42 insertions(+), 15 deletions(-) diff --git a/src/restore.rs b/src/restore.rs index 5a5a398..741b3e1 100644 --- a/src/restore.rs +++ b/src/restore.rs @@ -2,6 +2,7 @@ use std::convert::TryInto; use std::sync::{Arc, Mutex}; use anyhow::{bail, format_err, Error}; +use futures::StreamExt; use once_cell::sync::OnceCell; use tokio::runtime::Runtime; @@ -13,7 +14,7 @@ use pbs_datastore::cached_chunk_reader::CachedChunkReader; use pbs_datastore::data_blob::DataChunkBuilder; use pbs_datastore::fixed_index::FixedIndexReader; use pbs_datastore::index::IndexFile; -use pbs_datastore::read_chunk::ReadChunk; +use pbs_datastore::read_chunk::AsyncReadChunk; use pbs_datastore::BackupManifest; use pbs_key_config::load_and_decrypt_key; use pbs_tools::crypt_config::CryptConfig; @@ -29,6 +30,9 @@ struct ImageAccessInfo { archive_size: u64, } +// use at max 16 buffered futures to make loading of chunks more concurrent +const MAX_BUFFERED_FUTURES: usize = 16; + pub(crate) struct RestoreTask { setup: BackupSetup, runtime: Arc<Runtime>, @@ -165,24 +169,47 @@ impl RestoreTask { let start_time = std::time::Instant::now(); - for pos in 0..index.index_count() { + let read_queue = (0..index.index_count()).map(|pos| { let digest = index.index_digest(pos).unwrap(); let offset = (pos * index.chunk_size) as u64; - if digest == &zero_chunk_digest { - let res = write_zero_callback(offset, index.chunk_size as u64); - if res < 0 { - bail!("write_zero_callback failed ({})", res); + let chunk_reader = chunk_reader.clone(); + async move { + let chunk = if digest == &zero_chunk_digest { + None + } else { + let raw_data = AsyncReadChunk::read_chunk(&chunk_reader, digest).await?; + Some(raw_data) + }; + + Ok::<_, Error>((chunk, pos, offset)) + } + }); + + // this buffers futures and pre-fetches some chunks for us + let mut stream = futures::stream::iter(read_queue).buffer_unordered(MAX_BUFFERED_FUTURES); + + while let Some(res) = stream.next().await { + let res = res?; + let pos = match res { + (None, pos, offset) => { + let res = write_zero_callback(offset, index.chunk_size as u64); + if res < 0 { + bail!("write_zero_callback failed ({})", res); + } + bytes += index.chunk_size; + zeroes += index.chunk_size; + pos } - bytes += index.chunk_size; - zeroes += index.chunk_size; - } else { - let raw_data = ReadChunk::read_chunk(&chunk_reader, digest)?; - let res = write_data_callback(offset, &raw_data); - if res < 0 { - bail!("write_data_callback failed ({})", res); + (Some(raw_data), pos, offset) => { + let res = write_data_callback(offset, &raw_data); + if res < 0 { + bail!("write_data_callback failed ({})", res); + } + bytes += raw_data.len(); + pos } - bytes += raw_data.len(); - } + }; + if verbose { let next_per = ((pos + 1) * 100) / index.index_count(); if per != next_per { -- 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel