Hi,

I'm running FIO benchmark to test my simple cluster (3 OSD's, 128 pg's - using 
Nautilus - v14.2.10)  and after certain load of clients performing random read 
operations, the OSDs show very different performances in terms of op latency. 
In extreme cases there is an OSD that performs much worse than the others, 
despite receiving a similar number of operations.

Getting more information on the distribution of operations, I can see that the 
operations are well distributed among the OSD's and the PG's, but in the OSD 
with poor performance, there is an internal queue (OSD Shard) that is 
dispatching requests very slowly. In my use case, for example, there is a OSD 
shard whose average wait time for operations was 120 ms and a OSD Shard that 
served a few more requests with an average wait time of 1.5 sec. The behavior 
of this queue ends up affecting the performance of ceph as a whole. The osd op 
queue implementation used is wpq, and during the execution I get a specific 
attribute of this queue (probably total_priority) that remains unchanged for a 
long time. The strange behavior is also repeated in other implementations 
(prio, m_clock).

I've used the mimic version, another pg's distribution and the behavior is 
always the same, but it can happen in a different OSD or in a different shard. 
By default, the OSD has 5 shards. Increasing the number of shards considerably 
improves the performance of this OSD, but I would like to understand what is 
happening with this specific queue in the default configuration.

Does anyone have any idea what might be happening?

Thanks, Mafra.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to