Hi guys,

I am loadtesting a couple clusters one with local ssd disks and another one
with ceph.

Both clusters have the same amount of cpu/ram and they are configured the
same way.
im sending the same amount of messages and producing with linger.ms=0 and
acks=all

besides seeing higuer latencies on ceph for the most part, compared to
local disk. There is something that I don't understand.

On the local disk cluster. messages per second matches exactly the
number of requests.
but on the ceph cluster messages  do not match total produce requests per
second.

and the only thing I can find is that the Producer purgatory in ceph kafka
cluster has more request queued up than the local disk.

Also RemoteTime-ms for producers is high, which could explain why there are
more requests on the purgatory.

To me , I think this means that the Producer is waiting to hear from all
the acks. which are set to all. But I don't understand why the local disk
Kafka cluster purgatory queue is way lower.

since I don't think disk is used for this? could be network saturation
since ceph  is network storage is interfering with the  producer waiting
for acks? is there a way to tune the producer purgatory? I did change
num.replica.fetchers but that only lowered the fetch purgatory.

Reply via email to