Thanks for starting this proposal Asaf.

I'm trying to think more about this part today.

Currently, the public API protocol is defined in PulsarApi.proto [1]
And we have internal proto files such as MLDataFormats.proto [2],
SchemaRegistryFormat.proto [3], ResourceUsage.proto [4].

It looks like the geo-replication is a relatively special one, the data
replication depends on not only the MessageMetadata but also
the topic lookup, create producer, close producer, etc. I think this should
be the challenge to have a separate proto for geo-replication.
Except for geo-replication, all the commands defined in PulsarApi.proto are
public APIs/fields.

The geo-replication is based on the Pulsar producer, which can reuse all
the producer's ability to implement the geo-replication.
But the replication needs some extra information.

[1]
https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto
[2]
https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/proto/MLDataFormats.proto
[3]
https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/proto/SchemaRegistryFormat.proto
[4]
https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/proto/ResourceUsage.proto

On Wed, Jul 20, 2022 at 11:47 PM Asaf Mesika <asaf.mes...@gmail.com> wrote:

> Hi,
>
> We started discussing in PIP-180, which Penghui recommended I move to a
> dedicated thread.
>
> Pulsar has a public API in its binary protocol, which the clients use to
> communicate with it. Nonetheless, it is its public API to the server.
>
> I believe the public API should not be changed for internal communication
> purposes. PIP-180 gives a really good example: We would like to introduce a
> new feature called Shadow Topic and would like to replicate messages from
> the source topic to the Shadow topic. It just so happens to be that the
> replication mechanism uses the Broker public API to send messages to a
> broker. The design would like to expand on that by adding a field to this
> public API, to serve that specific feature needs (the field is not generic,
> it's specifically named shadow_message_id).
>
> I believe someone who tries to reason about Pulsar, and its architecture,
> by looking at its public API should not have any fields which will never be
> relevant to the reader.  It makes it hard to reason and understand the
> public API.
>
> The second problem is clients: Every such field will eventually trickle
> down to the clients, which will need to ignore that field. In my opinion,
> it makes it harder for the client's maintainers. Especially when the
> community goal is to expand and have many languages clients maintained by
> the community
>
> The public API today already contains many fields which are only for
> internal use. Here are a few that I found (please correct me if I'm wrong
> here):
>
> // Property set on replicated message,
> // includes the source cluster name
> optional string replicated_from = 5;
>
> // Override namespace's replication
> repeated string replicate_to    = 7;
>
> // Identify whether a message is a "marker" message used for
> // internal metadata instead of application published data.
> // Markers will generally not be propagated back to clients
> optional int32 marker_type = 20;
>
>
> I would like to discuss that with you, get your feedback and whether you
> think it's correct to accept a decision to avoid changing the public API.
>
> One alternative I was thinking about (I'm still fairly new, so I don't have
> all the experience and context here) is creating an internal non-public
> API, which will be used for internal communication: different proto,
> different port.
>
> Thanks for your time,
>
> Asaf
>

Reply via email to