Hi,

We started discussing in PIP-180, which Penghui recommended I move to a
dedicated thread.

Pulsar has a public API in its binary protocol, which the clients use to
communicate with it. Nonetheless, it is its public API to the server.

I believe the public API should not be changed for internal communication
purposes. PIP-180 gives a really good example: We would like to introduce a
new feature called Shadow Topic and would like to replicate messages from
the source topic to the Shadow topic. It just so happens to be that the
replication mechanism uses the Broker public API to send messages to a
broker. The design would like to expand on that by adding a field to this
public API, to serve that specific feature needs (the field is not generic,
it's specifically named shadow_message_id).

I believe someone who tries to reason about Pulsar, and its architecture,
by looking at its public API should not have any fields which will never be
relevant to the reader.  It makes it hard to reason and understand the
public API.

The second problem is clients: Every such field will eventually trickle
down to the clients, which will need to ignore that field. In my opinion,
it makes it harder for the client's maintainers. Especially when the
community goal is to expand and have many languages clients maintained by
the community

The public API today already contains many fields which are only for
internal use. Here are a few that I found (please correct me if I'm wrong
here):

// Property set on replicated message,
// includes the source cluster name
optional string replicated_from = 5;

// Override namespace's replication
repeated string replicate_to    = 7;

// Identify whether a message is a "marker" message used for
// internal metadata instead of application published data.
// Markers will generally not be propagated back to clients
optional int32 marker_type = 20;


I would like to discuss that with you, get your feedback and whether you
think it's correct to accept a decision to avoid changing the public API.

One alternative I was thinking about (I'm still fairly new, so I don't have
all the experience and context here) is creating an internal non-public
API, which will be used for internal communication: different proto,
different port.

Thanks for your time,

Asaf

Reply via email to