Hey all,
We’ve been running some benchmarks against Ceph which we deployed using the 
Rook operator in Kubernetes. Everything seemed to scale linearly until a point 
where I see a single OSD receiving much higher CPU load than the other OSDs 
(nearly 100% saturation). After some investigation we noticed a ton of pubsub 
traffic in the strace coming from the RGW pods like so:

[pid 22561] sendmsg(77, {msg_name(0)=NULL, 
msg_iov(3)=[{"\21\2)\0\0\0\10\0:\1\0\0\10\0\0\0\0\0\10\0\0\0\0\0\0\20\0\0-\321\211K"...,
 73}, {"\200\0\0\0pubsub.user.ceph-user-wwITOk"..., 314}, 
{"\0\303\34[\360\314\233\2138\377\377\377\377\377\377\377\377", 17}], 
msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished …>

I’ve checked other OSDs and only a single OSD receives these messages. I 
suspect its creating a bottleneck. Does anyone have an idea on why these are 
being generated or how to stop them? The pubsub sync module doesn’t appear to 
be enabled, and our benchmark is doing simple gets/puts/deletes.

We’re running Ceph 14.2.5 nautilus

Thank you!
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to