May not apply, but usually when I have strange (and bad) behaviours like these, I double check name resolution/DNS configuration on all hosts involved in.

Il 26/11/2024 11:19, Martin Gerhard Loschwitz ha scritto:
Hi Alex,

thank you for the reply. Here are all the steps we’ve done in the last weeks to 
reduce complexity (we’re focussing on the HDD cluster for now in which we are 
seeing the worst results in relation — but it also happens to be the easiest 
setup network-wise, despite only having a 1G link between the nodes).

* measure IOPS values per physical device (result was within the expectations 
for HDDs)
* reinstall OS, reset BIOS, reset HBA configuration  (or actually, switch Dell 
PERC to HBA mode)

Current setup is Ubuntu 24.04 with Linux 6.5. This yields better results than 
20.04 with some 5.something kernel and Ceph 17 (65 vs. 41 IOPS), but all that 
is still terrible.

We’re also not seeing anything obvious in iostat. Latency is LAN latency and 
normal, no packet loss. MTU 1500 or MTU 9000 literally don’t make a difference.

When we disable replication in that setup (pool size=1), we get about 90 IOPS 
from the same pool. But there is no special network configuration in place. I 
am attaching a dump of historic OSD ops of an example OSD in the cluster for 
further reference, maybe somebody sees something obvious in there.

Best regards
Martin




Am 26.11.2024 um 03:43 schrieb Alex Gorbachev <a...@iss-integration.com>:

Hi Martin,

This is a bit of generic recommendation, but I would go down the path of 
reducing complexity, i.e. first test the drive locally on the OSD node and see 
if there's anything going on with e.g. drive firmware, cables, HBA, power.

Then do fio from another host, and this would incorporate networking.

If those look fine, I would do something crazy with Ceph, such as a huge number 
of PGs, or failure domain of OSD, and just deploy a handful of OSDs to see if 
you can bring the problem out in the open.  I would use a default setup, with 
no tweaks to scheduler etc.  Hopefully, you'll get some error messages in the 
logs - ceph logs, syslog, dmesg.  Maybe at that point it will become more 
obvious, or at least some messages will come through that will make sense (to 
you or someone else on the list).

In other words, it seems you have to break this a bit more to get proper 
diagnostics.  I know you guys have played with Ceph before, and can do the math 
of what the IOPS values should be - three clusters all seeing the same problem 
would most likely indicate a non-default configuration value that is not 
correct.
--
Alex Gorbachev
ISS



On Mon, Nov 25, 2024 at 9:34 PM Martin Gerhard Loschwitz <martin.loschw...@true-west.com 
<mailto:martin.loschw...@true-west.com>> wrote:
Folks,

I am getting somewhat desperate debugging multiple setups here within the same 
environment. Three clusters, two SSD-only, one HDD-only, and what they all have 
in common is abysmal 4k IOPS performance when measuring with „rados bench“. 
Abysmal means: In an All-SSD cluster I will get roughly 400 IOPS over more than 
250 devices. I’ve know SAS-SSDs are not ideal, but 250 looks a bit on the low 
side of things to me.

In the second cluster, also All-SSD based, I get roughly 120 4k IOPS. And the 
HDD-only cluster delivers 60 4k IOPS. The latter both with substantially fewer 
devices, granted. But even with 20 HDDs, 68 4k IOPS seems like a very bad value 
to me.

I’ve tried to rule out everything I know of: BIOS misconfigurations, HBA 
problems, networking trouble (I am seeing comparably bad values with a size=1 
pool) and so further and so on. But to no avail. Has anybody dealt with 
something similar on Dell hardware or in general? What could cause such 
extremely bad benchmark results?

I measure with rados bench and qd=1 at 4k block size. „ceph tell osd bench“ 
with 4k blocks yields 30k+ IOPS for every single device in the big cluster, and 
all that leads to is 400 IOPS in total when writing to it? Even with no 
replication in place? That looks a bit off, doesn't it? Any help will be 
greatly appreciated, thank you very much in advance. Even a pointer to the 
right direction would be held in high esteem right now. Thank you very much in 
advance!

Best regards
Martin
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io>
To unsubscribe send an email to ceph-users-le...@ceph.io 
<mailto:ceph-users-le...@ceph.io>

--
ing. Sergio Rabellino

Università degli Studi di Torino
Dipartimento di Informatica
Tecnico di Ricerca
Tel +39-0116706701 Fax +39-011751603
C.so Svizzera , 185 - 10149 - Torino

<http://www.di.unito.it>

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to