+1 for we can have a short-term solution first. Just one point, please make sure the change will not introduce too much heap memory overhead.
Thanks, Penghui On Wed, Aug 24, 2022 at 11:54 PM Enrico Olivelli <eolive...@gmail.com> wrote: > Hello folks, > These days I am working on a use case in which I have many > subscriptions on the same topic (~100). > If you get to a situation with a big backlog, the consumers are no > more able to leverage the broker cache then the broker starts to read > from BookKeeper. > > In 2.11 we have a first mechanism of catch-up reads cache for > backlogged consumers (when you have a few backlogged cursors we start > putting in the cache the entries that you read). > > That is not enough to deal with my case because if you have many > consumers (one per partition) that connect all together, for instance > after a major problem in the application, the cache is empty and the > broker starts to overwhelm the bookies with reads. > > As a short term solution I have created this patch > https://github.com/apache/pulsar/pull/17241 > > The patch basically saves reads from BK by preventing concurrent reads > for the same entries (actually "ranges of entries"). > If the broker starts a read for a range and then comes another read > for the same range before the read completes, then we skip sending the > new read and we use the result of the BK request. > > This simple patch is really effective, in some tests it reduces a lot > (from 100K reads/s to 250 reads/s) the concurrent reads to BK. > You may say that it is unlikely that the broker sends concurrent > requests for exactly the same range of entries but empirically I have > observed that it happens quite often, at least in the scenario I drew > above. > > My proposal is to commit my patch as a short term solution. > I have already talked with PengHui and a better solution would be to > introduce explicitly a "Catch Up reads cache", but that is a mid term > implementation. > > WDYT ? > > > Enrico >