Thanks for starting this proposal Asaf. I'm trying to think more about this part today.
Currently, the public API protocol is defined in PulsarApi.proto [1] And we have internal proto files such as MLDataFormats.proto [2], SchemaRegistryFormat.proto [3], ResourceUsage.proto [4]. It looks like the geo-replication is a relatively special one, the data replication depends on not only the MessageMetadata but also the topic lookup, create producer, close producer, etc. I think this should be the challenge to have a separate proto for geo-replication. Except for geo-replication, all the commands defined in PulsarApi.proto are public APIs/fields. The geo-replication is based on the Pulsar producer, which can reuse all the producer's ability to implement the geo-replication. But the replication needs some extra information. [1] https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto [2] https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/proto/MLDataFormats.proto [3] https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/proto/SchemaRegistryFormat.proto [4] https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/proto/ResourceUsage.proto On Wed, Jul 20, 2022 at 11:47 PM Asaf Mesika <asaf.mes...@gmail.com> wrote: > Hi, > > We started discussing in PIP-180, which Penghui recommended I move to a > dedicated thread. > > Pulsar has a public API in its binary protocol, which the clients use to > communicate with it. Nonetheless, it is its public API to the server. > > I believe the public API should not be changed for internal communication > purposes. PIP-180 gives a really good example: We would like to introduce a > new feature called Shadow Topic and would like to replicate messages from > the source topic to the Shadow topic. It just so happens to be that the > replication mechanism uses the Broker public API to send messages to a > broker. The design would like to expand on that by adding a field to this > public API, to serve that specific feature needs (the field is not generic, > it's specifically named shadow_message_id). > > I believe someone who tries to reason about Pulsar, and its architecture, > by looking at its public API should not have any fields which will never be > relevant to the reader. It makes it hard to reason and understand the > public API. > > The second problem is clients: Every such field will eventually trickle > down to the clients, which will need to ignore that field. In my opinion, > it makes it harder for the client's maintainers. Especially when the > community goal is to expand and have many languages clients maintained by > the community > > The public API today already contains many fields which are only for > internal use. Here are a few that I found (please correct me if I'm wrong > here): > > // Property set on replicated message, > // includes the source cluster name > optional string replicated_from = 5; > > // Override namespace's replication > repeated string replicate_to = 7; > > // Identify whether a message is a "marker" message used for > // internal metadata instead of application published data. > // Markers will generally not be propagated back to clients > optional int32 marker_type = 20; > > > I would like to discuss that with you, get your feedback and whether you > think it's correct to accept a decision to avoid changing the public API. > > One alternative I was thinking about (I'm still fairly new, so I don't have > all the experience and context here) is creating an internal non-public > API, which will be used for internal communication: different proto, > different port. > > Thanks for your time, > > Asaf >