Pulsar Community Meeting minutes 2023/11/23 Notice: Draft minutes pending review - please suggest any corrections or additions by replying to this email thread.
- Attendees: - Girish Sharma - YuWei Sung - Apurva T - Asaf Mesika - Lari Hotari - Chris Bono - Agenda - PIP-310 and rate limiting improvements - Pulsar Rate Limiter requirements by Girish - https://docs.google.com/document/d/1-y5nBaC9QuAUHKUGMVVe4By-SmMZIL4w09U1byJBbMc - Lari to present summary of views on PIP-310. This is documented in the blog post “Apache Pulsar service level objectives and rate limiting” <https://codingthestreams.com/pulsar/2023/11/22/pulsar-slos-and-rate-limiting.html> . Please read the blog post before the meeting as a preparation - Meeting Minutes: - Girish presented the background and the problem with the current rate limiters by going over the Pulsar Rate Limiter document <https://docs.google.com/document/d/1-y5nBaC9QuAUHKUGMVVe4By-SmMZIL4w09U1byJBbMc>. The conclusion is that there’s a need for supporting bursting while keeping the allowed bursting on a single broker under the limit of what the broker can do. - Related to how the combined bursting of all topics in a broker could be kept under the limits of a broker, Lari added that in Confluent Kora, there's a concept called dynamic quota management that is described in the Kora paper section 5.2.2 <http://vldb.org/pvldb/vol16/p3822-povzner.pdf#page=11>: "Kora addresses this issue by using a dynamic quota mechanism that adjusts bandwidth distribution based on a tenant’s bandwidth consumption." - While bursting, the remaining available capacity on the broker could be proportionally split based on the configured topic rates. - Girish added that in their case, the topics that should be prioritized in bursting aren’t the ones with the highest throughput. - There would be a need to have SLA/SLO (Service Level Objective) metadata for topics in the future that would help Pulsar making proper prioritization decisions in these types of scenarios. - Girish continued explaining the details of rate limiting bursting requirements by going over the document. There are very valuable findings and observations that will be very helpful in improving the Pulsar rate limiting solution. Girish has taken an approach in the document where it goes beyond PIP-310 to explain the requirements from his organization’s perspective. - After going over Girish's Pulsar Rate Limiter document, there was a discussion about the next steps for proceeding forward. - There was a consensus that the default (“polling”) rate limiter option in Pulsar is unusable in practice and this needs to be addressed in the Pulsar core (see Girish’s analysis in the document section “4.1 Existing pulsar rate limiter” <https://docs.google.com/document/d/1-y5nBaC9QuAUHKUGMVVe4By-SmMZIL4w09U1byJBbMc/edit#heading=h.nx692qsf70id>). - The group discussed the next steps in order to make progress. There are two separate areas of work: addressing the issues with the Pulsar default rate limiters and the other one is addressing the requirements that Girish brought up in his presentation over the Pulsar Rate Limiter document <https://docs.google.com/document/d/1-y5nBaC9QuAUHKUGMVVe4By-SmMZIL4w09U1byJBbMc>. - Lari presented his view to address the issue in the Pulsar default rate limiters based on his blog post “Apache Pulsar service level objectives and rate limiting section “Problems to address as the next step” <https://codingthestreams.com/pulsar/2023/11/22/pulsar-slos-and-rate-limiting.html#problems-to-address-as-the-next-step>. - The first goal is to reach feature parity with the current rate limiters in Pulsar without introducing breaking changes. - Instead of adding more feature flags to clutter the code base and add more complexity, this would be handled as a refactoring where the existing internal solution in the Pulsar code base is replaced with the new solution that addresses the problems explained in the blog post. - The replacement solution for the refactoring has already been sufficiently validated (explained in the blog post <https://codingthestreams.com/pulsar/2023/11/22/pulsar-slos-and-rate-limiting.html#problems-to-address-as-the-next-step>) so that there’s confidence to move forward. - There was a question whether this change could be implemented with a feature flag instead of handling it as a refactoring where old code gets deleted and removed. - Lari thinks that this would be a bad idea in this case since it would increase complexity in the code base, and it would make it even harder to maintain the code base in the future. He would rather solve this by creating a minimal refactoring PR that reaches feature parity with the existing solution in a single PR. - There was a discussion that it would be a hard PR to review because it could be a large change since the current rate limiting touches many parts of the code base. - It was then discussed if a PIP should first be made before starting to make further changes towards this direction. - There was a discussion about the PIP process. Lari said that the process could be adjusted when it is needed. In this case, Lari is planning to proceed by first creating a PR in draft mode before writing a PIP. Lari’s opinion is that PIPs could also be created in a different order when it makes sense. In Apache projects, the Pulsar dev mailing list is the place where decisions are made eventually. There was a long discussion about the tradeoffs of PIPs and the process. (I’m sorry that I couldn’t capture that to meeting notes. Someone also mentioned that Lari’s blog post is already almost a PIP.) - Lari explained that by creating the draft PR, it would also show the extent of the required changes. Analyzing the required changes without doing actual changes is not practical in this case. - **Conclusion 1**: Lari will attempt to create a PR for the Pulsar rate limiting refactoring changes in draft mode, and then proceed to create a PIP that covers the refactoring. The main reason a PIP is needed for this change is that it is a large code change touching multiple components, as required by the PIP process guidelines. (PIP process <https://github.com/apache/pulsar/blob/master/pip/README.md>). - **Conclusion 2:** For Girish’s requirements for rate limiting, it was agreed that Girish would start a “parent PIP” which focuses on describing the Pulsar rate limiter requirements (outcomes) and the problem instead of the solution. Child PIPs could follow. The next meeting will be held on December 7th, 2023. Everyone is welcome to join. Here is the Pulsar Community Meeting calendar, which includes the Zoom link: https://github.com/apache/pulsar/wiki/Community-Meetings. Please add your agenda proposals to the meeting minutes document. You can find the link to this document on the community meetings page.