Hi all, I'd like to get some more eyes on this long outstanding performance issue with large fan-outs (a large number of consumers for a single topic). The broker cache does not work as expected due to invalid changes introduced in version 2.8.2 by PR https://github.com/apache/pulsar/pull/12045.
I have reported the issue with broker cache over 2 months ago with https://github.com/apache/pulsar/issues/16054 . Please take a look. The workaround seems to be to use the feature introduced in PR https://github.com/apache/pulsar/pull/14985, "Add a cache eviction policy:Evicting cache data by the slowest markDeletedPosition", by setting cacheEvictionByMarkDeletedPosition=true . One notable detail is that the cacheEvictionByMarkDeletedPosition entry is not included in conf/broker.conf or conf/standalone.conf files. Therefore in the Pulsar Helm chart, it is necessary to use "PULSAR_PREFIX_cacheEvictionByMarkDeletedPosition: true" to activate the setting. However, the problem with the cacheEvictionByMarkDeletedPosition solution is that the cache could be filled with irrelevant entries when there's a lot of topics and consumers on a single broker since the cache expiration is based on markDeletedPosition and not the farthest behind read position for active consumers. My assumption is also that there wouldn't have been a need to add https://github.com/apache/pulsar/pull/14985 at all if the broker cache would have worked as expected. PIP-174: Provide new implementation for broker dispatch cache ( https://github.com/apache/pulsar/issues/15954 / https://github.com/apache/pulsar/pull/15955) will provide a new broker cache implementation where this issue hopefully goes away, but we will regardless need to make the fix for current maintenance versions. I'm looking forward to feedback of original contributors of the broker cache and the related changes so that we finally get this issue resolved. Thanks, Lari