Hi Asaf, I really don't know what's your concern but it seems you don't have much understanding about Pulsar client/server protocol or you really would like to block the PIP. I tried to answer your concerns but let me try again to add more context about the implementation if that something can help you: this PIP makes change only in protobuf of message-id which is in implementation named as MessageIdData and it uses to serialize and deserialize messageId for the users. and this PIP is adding a new field to support messageId deserialization for partition-topic or multi-consumer topics. Now, does it impact wire protocol and will the client start sending this newly added field topic-name to broker? then answer is no because while sending ack command to broker client creates messageID where it doesn't set this field [1] and this new field only used during message serialization/deserialization when client app calls toByteArray()/fromByteArray() methods. so, this should not add any n/w overhead for the payload when client sends ack command to broker. [1] https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java#L1018
I am not sure if that helps you to answer the question or I should try to talk about Pulsar client-server protocol implementation here but we can help you in slack#dev channel if you have more implementation questions. Thanks, Rajan On Tue, Jun 20, 2023 at 11:31 AM Asaf Mesika <asaf.mes...@gmail.com> wrote: > On Tue, Jun 20, 2023 at 9:39 AM Rajan Dhabalia <rdhaba...@apache.org> > wrote: > > > > So you say in that sentence that you will add the topic name into > > MessageIdData. MessageIdData is defined in PulsarApi.proto and is > > transferred over the wire, so how can you add the topic to this class > > without changing the wire protocol? > > Yes, the client creates a separate MessageId while creating a serialized > > payload for acking where it doesn't set or send topicname and it won't > > change the payload. > > > > > But it contradicts what you wrote in the design doc. I'm sorry, but I don't > get it. > Can you please help me understand this by elaborating so anyone, including > me, can fully understand it? > Preferably all your answers should be injected into the document, of > course. > > Thanks! > > Asaf > > > > > Thanks, > > Rajan > > > > On Mon, Jun 19, 2023 at 5:45 AM Asaf Mesika <asaf.mes...@gmail.com> > wrote: > > > > > First, let me add some data that should be added to the Background > > section > > > of the PIP since I had to reverse engineer the code to understand that, > > > which is the opposite of the goal of a design document. > > > > > > ---- > > > Pulsar Broker has a binary protocol, which allows the client to consume > > > messages, acknowledge them, and much more. The protocol comprises > > Commands > > > containing the data needed to apply that Command on the broker side. > Many > > > commands allow a consumer (client) to acknowledge messages, among them: > > > CommandSendReceipt, CommandSend, CommandAck, and more. All those > commands > > > use the message type MessageIdData to specify the details of the > message > > to > > > acknowledge. > > > > > > Here's what this data structure looks like: > > > message MessageIdData { > > > required uint64 ledgerId = 1; > > > required uint64 entryId = 2; > > > optional int32 partition = 3 [default = -1]; > > > optional int32 batch_index = 4 [default = -1]; > > > repeated int64 ack_set = 5; > > > optional int32 batch_size = 6; > > > > > > // For the chunk message id, we need to specify the first chunk message > > id. > > > optional MessageIdData first_chunk_message_id = 7; > > > } > > > > > > The key fields are the ledgerID at which the message is contained and > > > entryId, which indicates the offset inside the ledger (message number). > > > > > > The client uses a class named MessageIdData which is the auto-generated > > > code representing the message MessageIdData. > > > --------- > > > > > > Now, in the design, you wrote: > > > > > > > Thefore, we need to add topic-name into MessageIdData and allow > > > > multi-topic/partitioned topic to deserialize message correctly so, > API > > > like > > > > acknowledge can perform as expected. > > > > > > > > > So you say in that sentence that you will add the topic name into > > > MessageIdData. > > > MessageIdData is defined in PulsarApi.proto and is transferred over the > > > wire, so how can you add the topic to this class without changing the > > wire > > > protocol? > > > > > > > > > > > > > > > > > > On Fri, Jun 16, 2023 at 10:47 PM Rajan Dhabalia <rdhaba...@apache.org> > > > wrote: > > > > > > > Yes, the topic name will not be transferred and it's not part of the > > wire > > > > protocol. Message uses MessageID protobuf data-structure to serialize > > and > > > > deserialize MessageId and it doesn't change any behavior nor will > > > transfer > > > > any additional fields to the broker. and I would not like to > introduce > > > any > > > > additional data-structure as that will create data copy, field > > > > inconsistencies, and more garbage due to more object allocation and > > > that's > > > > something we would like to avoid. > > > > > > > > Thanks, > > > > Rajan > > > > > > > > On Mon, May 15, 2023 at 6:18 AM PengHui Li <peng...@apache.org> > wrote: > > > > > > > > > I think the topic name will not be transmitted to the broker. > > > > > The client side used the class generated by the protobuf message. > > > > > Or, we can create another class to avoid coupling issues, but it > > > > > will introduce more changes and copy data from one structure > > > > > to another. For the long-term, I think it should be a good way if > > > > > we don't have blockers with this solution. Because I don't think > > > > > there is a higher priority in the long run than keeping the > protocol > > > > clear. > > > > > > > > > > If the above options are not feasible. At least, we should clarify > > > > > it in the proposal and add comments in the proto file to avoid > > > > > other clients transmitting the topic name to the broker. > > > > > > > > > > Thanks, > > > > > Penghui > > > > > > > > > > On Fri, May 12, 2023 at 5:59 PM Asaf Mesika <asaf.mes...@gmail.com > > > > > > wrote: > > > > > > > > > > > I don't get it - you say msgId is a data structure contained > within > > > > > > MessageId implementation, right? I presume msgId is the data > > > structure > > > > > the > > > > > > client transmit to the server, so that means you are transmitting > > > topic > > > > > to > > > > > > the server? > > > > > > > > > > > > > > > > > > On Fri, May 12, 2023 at 7:45 AM Rajan Dhabalia < > > rdhaba...@apache.org > > > > > > > > > > wrote: > > > > > > > > > > > > > Thank you for sharing your knowledge about the PIP which should > > be > > > > > > created > > > > > > > before PR and I think everyone in the community knows about it. > > but > > > > you > > > > > > can > > > > > > > check the PR for context which was blocked for sometime and we > > > > decided > > > > > to > > > > > > > create PIP with proto changes. > > > > > > > > > > > > > > This PIP/PR tries to fix the issue where partitioned topic > fails > > > > while > > > > > > > acking deserialized messageId. topic name will be part of > > MsgIdData > > > > > which > > > > > > > is the data-structure used by messageID to store msgID context > > > along > > > > > with > > > > > > > partition, batching, and other metadata. topic name will be > > > attached > > > > > only > > > > > > > when the user tries to serialize and deserialize the messageId > > > which > > > > > will > > > > > > > be purely client side implementation and in other cases it will > > not > > > > be > > > > > > > transmitted to server. Also, partitioned topic's abstract > concept > > > for > > > > > > user > > > > > > > and messageID must be also remain abstract for users and users > > must > > > > not > > > > > > > know about different implementation of messageId and our goal > is > > to > > > > > > > maintain that abstraction without telling user about > > MessageIdImpl > > > or > > > > > > > TopicMessageIdImpl. > > > > > > > > > > > > > > Thanks, > > > > > > > Rajan > > > > > > > > > > > > > > > > > > > > > On Thu, May 11, 2023 at 7:29 AM Asaf Mesika < > > asaf.mes...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Rajan, > > > > > > > > > > > > > > > > A few comments on the PIP as I couldn't understand it fully > as > > > some > > > > > > > pieces > > > > > > > > of information is missing. > > > > > > > > > > > > > > > > First, I would like to remind about the rules, that exists in > > the > > > > > > > beginning > > > > > > > > of the PIP template: > > > > > > > > > > > > > > > > <!-- > > > > > > > > RULES > > > > > > > > * Never place a link to an external site like Google Doc. The > > > > > proposal > > > > > > > > should be in this issue entirely. > > > > > > > > * Use a spelling and grammar checker tools if available for > you > > > > > (there > > > > > > > are > > > > > > > > plenty of free ones) > > > > > > > > > > > > > > > > PROPOSAL HEALTH CHECK > > > > > > > > I can read the design document and understand the problem > > > statement > > > > > and > > > > > > > > what you plan to change *without* resorting to a couple of > > hours > > > of > > > > > > code > > > > > > > > reading just to start having a high level understanding of > the > > > > > change. > > > > > > > > --> > > > > > > > > > > > > > > > > > > > > > > > > In this specific case > > > > > > > > 1. I would include explanation and detail the data structures > > > > fields > > > > > of > > > > > > > > objects you mentioned, such as: MessageIdImpl and > > MessageIdData. > > > > > > > > 2. I would not put a PR as the design section, so I need to > > read > > > > code > > > > > > to > > > > > > > > understand what the exact solution details. > > > > > > > > > > > > > > > > You wrote: > > > > > > > > > > > > > > > > > Pulsar api provides MessageId > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/MessageId.java > > > > > > > > > > > > > > > > interface > > > > > > > > > which is generally used by producer and consumer > applications > > > to > > > > > > manage > > > > > > > > > topic offset. > > > > > > > > > > > > > > > > > > > > > > > > I think it's used to allow consumers to acknowledge (can be > per > > > > > > message) > > > > > > > so > > > > > > > > offset if wrong terminology here. > > > > > > > > For producers, not sure exactly its usage. Maybe if they need > > to > > > > > refer > > > > > > to > > > > > > > > this message later when reading by Reader interface. > > > > > > > > I would correct this section. > > > > > > > > > > > > > > > > However, right now Pulsar doesn't support correct > > deserialization > > > > of > > > > > > > > > multi-topic or partitioned-topic because of that > > 1acknowledge` > > > > API > > > > > > call > > > > > > > > > fails for those topics with below error > > > > > > > > > > > > > > > > > > > > > > > > You're saying that the acknowledgement API method signature > > > > receives > > > > > > > > MessageId, but do not receive TopicMessageId? > > > > > > > > > > > > > > > > I have a few questions on that: > > > > > > > > > > > > > > > > 1. The acknowledgement API is part of Pulsar binary protocol. > > Is > > > > your > > > > > > > plan > > > > > > > > to alter that protocol so it will also include the topic > field > > as > > > > > part > > > > > > of > > > > > > > > the message ID? > > > > > > > > > > > > > > > > 2. I think your PIP needs to explain the following items > which > > > are > > > > > > > missing > > > > > > > > as context: > > > > > > > > - There are two implementation for MessageId interface, one > for > > > > topic > > > > > > and > > > > > > > > one for partitioned topic. > > > > > > > > - The problem is that the serialization/desrialization method > > is > > > > used > > > > > > > > mainly for translating the ID into the binary protocol, which > > > only > > > > > > > requires > > > > > > > > the ID (ledger ID, entry ID). > > > > > > > > - The reason for that is that once you created a consumer, it > > > has a > > > > > > topic > > > > > > > > attached to it. Transferring the topic for the ack is > > redundant. > > > > > > > > > > > > > > > > All of this needs to be in the background. > > > > > > > > > > > > > > > > I have several ideas on solving that, which IMO should mainly > > be > > > in > > > > > the > > > > > > > > client level, but I must get answers to the questions above > > > before > > > > I > > > > > > can > > > > > > > > continue. > > > > > > > > > > > > > > > > Last note > > > > > > > > You have basically placed a link to a pull request as the > > design > > > > > > solution > > > > > > > > (high-level/detailed design). > > > > > > > > The whole idea of the design is that you describe the > solution > > > > > without > > > > > > > > resorting to code. > > > > > > > > IMO you should amend the design, state the goal shortly, and > > > have a > > > > > > high > > > > > > > > level design section which contains 1-2 short paragraphs > > > describing > > > > > > > exactly > > > > > > > > your solution. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Asaf > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, May 9, 2023 at 3:24 PM Yunze Xu <x...@apache.org> > > wrote: > > > > > > > > > > > > > > > > > I'm talking about whether to add a new separate API. I'm > > > > concerned > > > > > > > > > about whether existing applications would be affected, no > > > matter > > > > if > > > > > > > > > the existing implementation has the limitation. If yes, we > > > should > > > > > > > > > document it in the PIP so that users can know that. > > > > > > > > > > > > > > > > > > > it's a new optional field which would not break the > > > > compatibility > > > > > > > > > > > > > > > > > > And yes, I just confirmed it with simple demos in my local > > env. > > > > So > > > > > > I'm > > > > > > > > > +1 to this proposal. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Yunze > > > > > > > > > > > > > > > > > > On Tue, May 9, 2023 at 3:05 PM Rajan Dhabalia < > > > > > rdhaba...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Weill there are multiple things: it's a new optional > field > > > > which > > > > > > > would > > > > > > > > > not > > > > > > > > > > break the compatibility , also current messaegId > > > serialization > > > > > and > > > > > > > > > > deserialization anyway only impact multi-topic consumer > > which > > > > is > > > > > > > > already > > > > > > > > > > broken or has the limitation and, adding a new separate > API > > > for > > > > > > > > > partitioned > > > > > > > > > > topic is not only not acceptable but creates too much > > > confusion > > > > > for > > > > > > > > users > > > > > > > > > > to use separate ack APIs for non-partition and partition > > > topics > > > > > and > > > > > > > > that > > > > > > > > > > also breaks partitioned topic abstraction which we would > > like > > > > to > > > > > > > avoid. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Rajan > > > > > > > > > > > > > > > > > > > > On Mon, May 8, 2023 at 11:27 PM Yunze Xu <x...@apache.org > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > It seems that `TopicMessageIdImpl#toByteArray` now > could > > > > > > serialize > > > > > > > > the > > > > > > > > > > > optional topic field to the bytes. I didn't test it now > > > but I > > > > > > have > > > > > > > a > > > > > > > > > > > concern about whether it would bring a breaking change. > > > > > > > > > > > > > > > > > > > > > > Assuming there are two applications (let's say A and B) > > > based > > > > > on > > > > > > an > > > > > > > > > > > older Pulsar client, A writes serialized bytes into a > > > file, B > > > > > > reads > > > > > > > > > > > bytes from the file and parses it to a MessageId. If A > > > > upgraded > > > > > > its > > > > > > > > > > > Pulsar client to the latest while B did not, what would > > > > happen? > > > > > > > Could > > > > > > > > > > > B still get the correct MessageId or the bytes would > not > > be > > > > > able > > > > > > to > > > > > > > > > > > parsed? > > > > > > > > > > > > > > > > > > > > > > P.S. it's better to add the API changes and potential > > > > breaking > > > > > > > > changes > > > > > > > > > > > in the proposal. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Yunze > > > > > > > > > > > > > > > > > > > > > > On Tue, May 9, 2023 at 1:59 PM Rajan Dhabalia < > > > > > > > rdhaba...@apache.org> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > Pulsar api provides MessageId interface which is > > > generally > > > > > used > > > > > > > by > > > > > > > > > > > producer > > > > > > > > > > > > and consumer applications to manage topic offset. > > > > Sometimes, > > > > > > > these > > > > > > > > > > > > applications would like to serialize and deserialize > > > > > > messageIds, > > > > > > > > > > > > specifically consumer app which would like to persist > > > > > messageId > > > > > > > and > > > > > > > > > ack > > > > > > > > > > > > with those messageIds by deserializing them. However, > > > right > > > > > now > > > > > > > > > Pulsar > > > > > > > > > > > > doesn't support correct deserialization of > multi-topic > > or > > > > > > > > > > > partitioned-topic > > > > > > > > > > > > because of that 1acknowledge` API call fails for > those > > > > topics > > > > > > > with > > > > > > > > > below > > > > > > > > > > > > error: > > > > > > > > > > > > "Only TopicMessageId is allowed to acknowledge for a > > > > > > multi-topics > > > > > > > > > > > consumer" > > > > > > > > > > > > > > > > > > > > > > > > MessageIdImpl stores id metadata into MessageIdData > > which > > > > > > doesn't > > > > > > > > > contain > > > > > > > > > > > > context about topic name to find out which topic > > belongs > > > to > > > > > > this > > > > > > > > > > > MessageID. > > > > > > > > > > > > Therefore, we need to add topic-name into > MessageIdData > > > and > > > > > > allow > > > > > > > > > > > > multi-topic/partitioned topics to deserialize > messages > > > > > > correctly > > > > > > > > so, > > > > > > > > > > > > consumer app can perform as expected. > > > > > > > > > > > > > > > > > > > > > > > > Please visit PIP for any suggestions: > > > > > > > > > > > > https://github.com/apache/pulsar/issues/20221 > > > > > > > > > > > > > > > > > > > > > > > > This PIP is created with PR: > > > > > > > > > https://github.com/apache/pulsar/pull/19944 > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Rajan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >