Hello folks,
These days I am working on a use case in which I have many
subscriptions on the same topic (~100).
If you get to a situation with a big backlog, the consumers are no
more able to leverage the broker cache then the broker starts to read
from BookKeeper.

In 2.11 we have a first mechanism of catch-up reads cache for
backlogged consumers (when you have a few backlogged cursors we start
putting in the cache the entries that you read).

That is not enough to deal with my case because if you have many
consumers (one per partition) that connect all together, for instance
after a major problem in the application, the cache is empty and the
broker starts to overwhelm the bookies with reads.

As a short term solution I have created this patch
https://github.com/apache/pulsar/pull/17241

The patch basically saves reads from BK by preventing concurrent reads
for the same entries (actually "ranges of entries").
If the broker starts a read for a range and then comes another read
for the same range before the read completes, then we skip sending the
new read and we use the result of the BK request.

This simple patch is really effective, in some tests it reduces a lot
(from 100K reads/s to 250 reads/s) the concurrent reads to BK.
You may say that it is unlikely that the broker sends concurrent
requests for exactly the same range of entries but empirically I have
observed that it happens quite often, at least in the scenario I drew
above.

My proposal is to commit my patch as a short term solution.
I have already talked with PengHui and a better solution would be to
introduce explicitly a "Catch Up reads cache", but that is a mid term
implementation.

WDYT ?


Enrico

Reply via email to