> > As Penghui suggested, this field name is changed to `message_id` for > potential generic usage. :) > > That's the thing - it's not really for potential generic use - it's more for potential *internal* generic usage, which is publicly exposed. When some outside visitor looks at the API and asks himself - "why should I provide a message ID for a message I'm publishing? Isn't ID something the broker creates for itself?" - This creates confusion, which leads IMO to less adoption and makes it harder to contribute. I'm quite new to Pulsar, and I feel that there is confusion in quite numerous parts of the system. My suggestion is raised here to try to avoid that confusion.
> > The second problem is clients: Every such field will eventually trickle > > down to the clients, which will need to ignore that field. In my opinion, > > it makes it harder for the client's maintainers. Especially when the > > community goal is to expand and have many languages clients maintained by > > the community > > Our current client's implementation is quite complex already. Comparing > with this, > ignoring a few fields does not seems to be a significant hard thing in > this, > as long as we document it well, right? > > Having internal fields makes the client even more complex. It's not just about ignoring fields, it's about having more and more of them. What I suggest is separating to an internal API and internal client for those internal use cases. I'm not only referring to PIP-180, but to any future PIP. > > > I believe someone who tries to reason about Pulsar, and its architecture, > > by looking at its public API should not have any fields which will never > be > > relevant to the reader. It makes it hard to reason and understand the > > public API. > > > > This design principle of keeping the public API clean is clear and easy to > understand and I totally support this. But in the case of PIP-180 or > geo-replication, the replicator can be considered as a special producer > client, and it just inherited the basic semantic of a normal producer and > extended its abilities to support some special internal usage. > > Of course we can use a different protocol and different port for strictly > inter-broker communications in theory. But the side effect of this would be > more codes, more machine resource usage, harder to maintain, and longer > time to > make the feature steady, comparing with just extending the abilities of > producer client. > > If this come to a case that inter-broker communication is needed and it is > not > the case of producer or consumer, I think we should definitely consider to > introduce the dedicated port and protocols. > > Again, my suggestion mainly applies for the future - to make a conscious decision to avoid overloading more internal use cases to the public API. PIP-180 is currently a good case study to explore that suggestion (well, the ship has sailed, but it still is a good example). I reiterate what I said before: You can say your sentence for any new internal feature: "the X can be considered a special producer client , and it just inherited the basic semantic of a normal producer and extended its abilities to support some special internal usage". Replace X with any feature, thereby expanding the public API more and more with internal fields the normal user should never know about - the whole notion of encapsulation and simplicity. I would also like others to chime in on this and get their thoughts as well. > On 2022/07/20 15:47:16 Asaf Mesika wrote: > > Hi, > > > > We started discussing in PIP-180, which Penghui recommended I move to a > > dedicated thread. > > > > Pulsar has a public API in its binary protocol, which the clients use to > > communicate with it. Nonetheless, it is its public API to the server. > > > > I believe the public API should not be changed for internal communication > > purposes. PIP-180 gives a really good example: We would like to > introduce a > > new feature called Shadow Topic and would like to replicate messages from > > the source topic to the Shadow topic. It just so happens to be that the > > replication mechanism uses the Broker public API to send messages to a > > broker. The design would like to expand on that by adding a field to this > > public API, to serve that specific feature needs (the field is not > generic, > > it's specifically named shadow_message_id). > > > > I believe someone who tries to reason about Pulsar, and its architecture, > > by looking at its public API should not have any fields which will never > be > > relevant to the reader. It makes it hard to reason and understand the > > public API. > > > > The second problem is clients: Every such field will eventually trickle > > down to the clients, which will need to ignore that field. In my opinion, > > it makes it harder for the client's maintainers. Especially when the > > community goal is to expand and have many languages clients maintained by > > the community > > > > The public API today already contains many fields which are only for > > internal use. Here are a few that I found (please correct me if I'm wrong > > here): > > > > // Property set on replicated message, > > // includes the source cluster name > > optional string replicated_from = 5; > > > > // Override namespace's replication > > repeated string replicate_to = 7; > > > > // Identify whether a message is a "marker" message used for > > // internal metadata instead of application published data. > > // Markers will generally not be propagated back to clients > > optional int32 marker_type = 20; > > > > > > I would like to discuss that with you, get your feedback and whether you > > think it's correct to accept a decision to avoid changing the public API. > > > > One alternative I was thinking about (I'm still fairly new, so I don't > have > > all the experience and context here) is creating an internal non-public > > API, which will be used for internal communication: different proto, > > different port. > > > > Thanks for your time, > > > > Asaf > > > > On 2022/07/20 15:47:16 Asaf Mesika wrote: > > Hi, > > > > We started discussing in PIP-180, which Penghui recommended I move to a > > dedicated thread. > > > > Pulsar has a public API in its binary protocol, which the clients use to > > communicate with it. Nonetheless, it is its public API to the server. > > > > I believe the public API should not be changed for internal communication > > purposes. PIP-180 gives a really good example: We would like to > introduce a > > new feature called Shadow Topic and would like to replicate messages from > > the source topic to the Shadow topic. It just so happens to be that the > > replication mechanism uses the Broker public API to send messages to a > > broker. The design would like to expand on that by adding a field to this > > public API, to serve that specific feature needs (the field is not > generic, > > it's specifically named shadow_message_id). > > > > I believe someone who tries to reason about Pulsar, and its architecture, > > by looking at its public API should not have any fields which will never > be > > relevant to the reader. It makes it hard to reason and understand the > > public API. > > > > The second problem is clients: Every such field will eventually trickle > > down to the clients, which will need to ignore that field. In my opinion, > > it makes it harder for the client's maintainers. Especially when the > > community goal is to expand and have many languages clients maintained by > > the community > > > > The public API today already contains many fields which are only for > > internal use. Here are a few that I found (please correct me if I'm wrong > > here): > > > > // Property set on replicated message, > > // includes the source cluster name > > optional string replicated_from = 5; > > > > // Override namespace's replication > > repeated string replicate_to = 7; > > > > // Identify whether a message is a "marker" message used for > > // internal metadata instead of application published data. > > // Markers will generally not be propagated back to clients > > optional int32 marker_type = 20; > > > > > > I would like to discuss that with you, get your feedback and whether you > > think it's correct to accept a decision to avoid changing the public API. > > > > One alternative I was thinking about (I'm still fairly new, so I don't > have > > all the experience and context here) is creating an internal non-public > > API, which will be used for internal communication: different proto, > > different port. > > > > Thanks for your time, > > > > Asaf > > >