Please find my response inline. On Mon, Jan 31, 2022 at 9:17 PM Michael Marshall <mmarsh...@apache.org> wrote:
> I think this is a very appropriate direction to take Pulsar's > geo-replication. Your proposal is essentially to make the > inter-cluster configuration event driven. This increases fault > tolerance and better decouples clusters. > > Thank you for your detailed proposal. After reading through it, I have > some questions :) > > 1. What do you think about using protobuf to define the event > protocol? I know we already have a topic policy event stream > defined with Java POJOs, but since this feature is specifically > designed for egressing cloud providers, ensuring compact data transfer > would keep egress costs down. Additionally, protobuf can help make it > clear that the schema is strict, should evolve thoughtfully, and > should be designed to work between clusters of different versions. > >>> I don't see a need of protobuf for this particular usecase because of two reasons: >> a. policy changes don't generate huge traffic which could be 1 rps b. and it doesn't need performance optimization. >> It should be similar as storing policy in text instead protobuf which doesn't impact footprint size or performance due to limited number of >> update operations and relatively less complexity. I agree that protobuf could be another option but in this case it's not needed. Also, POJO >> can also support schema and versioning. > > 2. In your view, which tenant/namespace will host > `metadataSyncEventTopic`? Will there be several of these topics or is > it just hosted in a system tenant/namespace? This question gets back > to my questions about system topics on this mailing list last week [0]. I > view this topic as a system topic, so we'd need to make sure that it > has the right authorization rules and that it won't be affected by calls > like "clearNamespaceBacklog". >> It doesn't matter if it's system-topic or not because it's configurable and the admin of the system can decide and configure it according to the required persistent policy. I would keep the system topic separate because this topic serves a specific purpose with specific schema, replication policy and retention policy. > > 3. Which broker will host the metadata update publisher? I assume we > want the producer to be collocated with the bundle that hosts the > event topic. How will this be coordinated? > >> It's already explained into PIP in section: "Event publisher and handler" >> Every isolated cluster deployed on a separate cloud platform will have a source region and part of replicated clusters for the event topic. The Source region will have a broker which will create a failover consumer on that topic and a broker with an active consumer will watch the metadata changes and publish the changes to the event topic. > > 4. Why isn't a topic a `ResourceType`? Is this because the topic level > policies already have this feature? If so, is there a way to integrate > this feature with the existing topic policy feature? > >> Yes, ResourceType can be extensible to a topic as well. > > 5. By decentralizing the metadata store, it looks like there is a > chance for conflicts due to concurrent updates. How do we handle those > conflicts? > >> PIP briefly talks about it but I will update the PIP with more explanation. MetadataChangeEvent contains source-cluster and updated time. Also, resources Tenant/Namespace will also contain lastUpdatedTime which will help to destination clusters to handle stale/duplicate events and race conditions. Also, snapshot-sync an additional task helps all clusters to be synced with each other eventually. > I'll also note that I previously proposed a system event topic here > [1] and it was proposed again here [2]. Those features were for > different use cases, but ultimately looked very similar. In my view, a > stream of system events is a very natural feature to expect in a > streaming technology. I wonder if there is a way to generalize this > feature to fulfill local cluster consumers and geo-replication > consumers. Even if this PIP only implements the geo-replication > portion of the feature, it'd be good to design it in an extensible fashion. > >> I think answer (2) addresses this concern as well. > Thanks, > Michael > > [0] https://lists.apache.org/thread/pj4n4wzm3do8nkc52l7g7obh0sktzm17 > [1] https://lists.apache.org/thread/h4cbvwjdomktsq2jo66x5qpvhdrqk871 > [2] https://lists.apache.org/thread/0xkg0gpsobp0dbgb6tp9xq097lpm65bx > > > > On Sun, Jan 30, 2022 at 10:33 PM Rajan Dhabalia <rdhaba...@apache.org> > wrote: > > > > Hi, > > > > I would like to start a discussion about PIP-136: Sync Pulsar policies > > across multiple clouds. > > > > PIP documentation: https://github.com/apache/pulsar/issues/13728 > > > > *Motivation* > > Apache Pulsar is a cloud-native, distributed messaging framework which > > natively provides geo-replication. Many organizations deploy pulsar > > instances on-prem and on multiple different cloud providers and at the > same > > time they would like to enable replication between multiple clusters > > deployed in different cloud providers. Pulsar already provides various > > proxy options (Pulsar proxy/ enterprise proxy solutions on SNI) to > fulfill > > security requirements when brokers are deployed on different security > zones > > connected with each other. However, sometimes it's not possible to share > > metadata-store (global zookeeper) between pulsar clusters deployed on > > separate cloud provider platforms, and synchronizing configuration > metadata > > (policies) can be a critical path to share tenant/namespace/topic > policies > > between clusters and administrate pulsar policies uniformly across all > > clusters. Therefore, we need a mechanism to sync configuration metadata > > between clusters deployed on the different cloud platforms. > > > > *Sync Pulsar policies across multiple clouds* > > https://github.com/apache/pulsar/issues/13728 > > Prototype git-hub-link > > < > https://github.com/rdhabalia/pulsar/commit/e59803b942918076ce6376b50b35ca827a49bcf6 > > > > Thanks, > > Rajan >