Hello all,

I'm seeing some unexpectedly slow performance when testing a copy job
process and I've pretty much run out of ideas on diagnosing it.

We are currently running two bacula storage daemons, on different VMs, and
have been attempting to use copy jobs to take a copy of backups offsite.

SD1 is storing 100GiB volumes on a HDD backed Ceph pool (via file volumes
on CephFS) - we have ~130 TiB of backups (no compression) in 100 GiB volume
files, multiple jobs were run in parallel onto the volumes and we were
getting >100 MiB/second write throughput (mostly client limited).

SD2 is a Cloud store, using the S3 driver to push volumes up to S3/Glacier
on a fast connection with a local SSD cache.

SD1 and SD2 are on the same 10 GiB switch, both have been recently upgraded
to bacula 15.0.3 and both are on reasonably modern CPUs (AMD EPYC 9124 for
SD1).

When we run a copy job we are seeing:
 - Expected backup jobs spawn
 - SD1 & SD2 connect to each other fine
 - SD1 mounts a volume file and starts streaming data to SD2, with
reasonable throughput (50 - 100 MiB/sec)
all seems well for a time then throughput drops to essentially zero
 - SD1 will have a single CPU pegged at 100%, with minimal IO traffic (both
ops and bandwidth) from the open volume file, we will get spikes of good
speed but average throughput after leaving a job running for a week is <1
MiB/sec.
 - SD2 is quiet, happily handling normal backup jobs from other clients
with normal performance

If we start a second, parallel, copy job we get similar initially good
throughput then peg a second CPU on SD1 to 100% but there isn't exactly a
big jump in performance.

There are no warnings/errors being logged and everything appears to be
"working", just glacially slow and apparently totally bottlenecked on
whatever that single CPU thread is doing with minimal reads from the
volumes.

Any suggestions on where to look for the root cause here?

Thanks
-- 

Chris Wright

Application Software Developer

<http://www.maglabs.net>


T:  0203 515 1000 | www.maglabs.net | Follow us <https://bit.ly/3x215vn>

MagLabs Limited is a Limited Liability Company registered at Companies
House, Cardiff. Registration No 06715580.
DISCLAIMER: This email and any attachments sent with it may contain
confidential and legally privileged information. It is intended solely for
the individual or entity to whom the email was addressed. If you are not
the intended recipient please notify the sender via email immediately,
delete the email (and attachments) from your computer system and destroy
any copies you may have in your possession. You are prohibited from using,
printing, copying or disclosing any of the information contained within the
email and its attachment(s). MagLabs Limited does not accept any
responsibility or liability for any changes made to this email after it was
sent or for any viruses transmitted through it. Opinions, comments, and
conclusions made in this email may be that of the author and may not
reflect the view of MagLabs Limited.

Please consider the environment before printing this email
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to