Hi Pulsar Community, Here are the meeting notes from today's community meeting. Thanks to all who participated!
Disclaimer: If something is misattributed or misrepresented, please send a correction to this list. Source google doc: https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE Thanks, Michael 2022/06/23, (8:30 AM PST) - Attendees: - Matteo Merli - Christophe Bornet - Ayman Khalil - Andrey Yegorov - Heesung Sohn - Michael Marshall - Rajan Dhabalia - Discussions/PIPs/PRs (Generally discussed in order they appear) - Michael: PIP 172 https://github.com/apache/pulsar/issues/15859. Looking to get feedback on this PIP so that we can raise awareness of the feature. Matteo: views that there are two modes for auto failover. The first is that there is a human controlled model that switches cluster URLs for the clients. This is not ideal since it is human driven and it requires an extra service. An advantage is that you don’t get spurious failures where clients are moving over for certain failures. In the design of this PIP, there is a risk of different views of which cluster is healthy and which is not. Matteo: is okay with automatic failover, but it is very crude. You could add comprehensive solutions about not being able to connect or not being able to produce messages for some length of time. We can expand the logic in the auto failover case by adding checks for number of errors over a window and configurable thresholds. (There was additional discussion about the broker health check and what it does.) - Rajan: https://github.com/apache/pulsar/pull/15223 - PIP for syncing pulsar policies across multiple clouds - there is no way to synchronize the metadata right now. Helps when there isn’t a global zookeeper, and allows for geo replicated clusters to have unique metadata stores. Rajan continued to describe the PIP, see https://github.com/apache/pulsar/issues/13728 for more details. Matteo: your use case may be multiple cloud, but the actual feature is more general. It is having global configuration without global zookeeper. One issue is two writes: what if there is an error in publishing the write to zookeeper and to bookkeeper. Which broker does the publishing and how do we make sure we don’t skip any updates? Rajan: if it fails to publish the message, the solution is eventually consistent. There is a publish snapshot solution, that would ensure dropped messages would eventually get sent as snapshots. There is a producer to send updates to the other clusters and then a consumer to get the update. Whichever consumer wins will get the update. Matteo: you can have a race where two consumers believe they have the message. The only way to do this is with the exclusive producer because it adds the producer fencing. Rajan: correct, but the event is idempotent, so the consumer can handle duplicates. Another issue to cover is how to handle many clusters and the relationships for which clusters replicate which policies. This proposal allows for using local global zookeeper for each region. Matteo: the snapshot is going to be complicated because if you take the snapshot and apply it before other updates, you’ll get conflicts and could lose data. You are treating the replication channel and the writing store as two separate things, but correcting them afterwards will be very hard. If you switch the logic, and first write to the topic as your write ahead log, then that is your store. This could be a wrapper on the metadata store. Before writing to zookeeper, publish on the topic, then when it is persisted, it can get applied to zookeeper. This handles crashing and restarts, which also gives you a clear replay model. It doesn’t account for inconsistency between clusters. Rajan: you’re suggesting we handle failure more gracefully? Matteo: yes, use a WAL to handle failure. Rajan: if the topic is down, you cannot write then. Matteo: correct. Rajan: there is still a need for a synchronizer though. Matteo: you can just enable compaction on the topic. That is your snapshot. Rajan: compaction comes with its own cost, though. Matteo: the cost of compaction is the cost of the snapshot. The compaction can run in the broker or can run manually. Rajan: when running a big scale system, compaction has its own issues. If we have any issues with storage, you lose data. I have lost a ledger, but not a zookeeper snapshot. Matteo: if you are taking a snapshot and are publishing it on a topic, are not you still relying on a ledger? (Some back and forth about requirements and compaction). Matteo: compaction is run on a very large scale in tens of clusters that I know of. I agree that there were many issues in compaction, but most of them should be solved. Your durability guarantees are tied to the ledger replication. Rajan: correct. Let me think about this. My concerns with compaction are scalability on server side and durability. Matteo: compaction should be good because there are limited number of keys. Topic policies are already replicated across clusters and are compacted. Rajan: we tried to use it, but it has been a while since we tried that. Compaction is something we’d like to avoid. I’ll try to update the PIP. - Rajan: https://github.com/apache/pulsar/pull/13080 - requesting review. The uniform load shedder strategy needs to consider namespace isolation policies. - Rajan: https://github.com/apache/pulsar/pull/12235 - requesting review. PR is blocked. Matteo: what is the use case? Rajan: basically, when dealing with the legacy system and want to do a migration, want to do custom migration. We are already running this change to help handle certain kinds of migrations. There are multiple use cases, like blue cluster/green cluster migration. Matteo: for migration, I have been discussing with Prashant, but a PIP never came from it. The idea was to do blue green with topic termination. All of the clients migrate automatically. Bring up new cluster, terminate topic on old cluster saying it is migrated to new service URL and maybe with a new topic name. The producer moves immediately and the consumer will move once end of topic is reached. Rajan: is there topic routing? Matteo: it becomes a topic redirect to move a producer/consumer from one cluster to a new one. Rajan: I’d like to start a discussion on that. If you don’t have a PIP, we’d like to help contribute that feature. Matteo: It hasn’t been implemented yet. Could be helpful for topic renaming, moving clusters, blue/green scenarios. Rajan: good to have it. I have seen the discussion on the mailing list, but it didn’t have the approach you mention. Matteo: I thought it was handled that way, but I’d have to double check. Rajan: just wanted to raise the visibility on this PR.