Hi Asaf,

I really don't know what's your concern but it seems you don't have much
understanding about Pulsar client/server protocol or you really would like
to block the PIP. I tried to answer your concerns but let me try again to
add more context about the implementation if that something can help you:
this PIP makes change only in protobuf of message-id which is in
implementation named as MessageIdData and it uses to serialize and
deserialize messageId for the users. and this PIP is adding a new field to
support messageId deserialization for partition-topic or multi-consumer
topics.
Now, does it impact wire protocol and will the client start sending this
newly added field topic-name to broker? then answer is no because while
sending ack command to broker client creates messageID where it doesn't set
this field [1] and this new field only used during message
serialization/deserialization when client app calls
toByteArray()/fromByteArray() methods. so, this should not add any n/w
overhead for the payload when client sends ack command to broker.
[1]
https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java#L1018

I am not sure if that helps you to answer the question or I should try to
talk about Pulsar client-server protocol implementation here but we can
help you in slack#dev channel if you have more implementation questions.

Thanks,
Rajan



On Tue, Jun 20, 2023 at 11:31 AM Asaf Mesika <asaf.mes...@gmail.com> wrote:

> On Tue, Jun 20, 2023 at 9:39 AM Rajan Dhabalia <rdhaba...@apache.org>
> wrote:
>
> > > So you say in that sentence that you will add the topic name into
> > MessageIdData. MessageIdData is defined in PulsarApi.proto and is
> > transferred over the wire, so how can you add the topic to this class
> > without changing the wire protocol?
> > Yes, the client creates a separate MessageId while creating a serialized
> > payload for acking where it doesn't set or send topicname and it won't
> > change the payload.
> >
> >
> But it contradicts what you wrote in the design doc. I'm sorry, but I don't
> get it.
> Can you please help me understand this by elaborating so anyone, including
> me, can fully understand it?
> Preferably all your answers should be injected into the document, of
> course.
>
> Thanks!
>
> Asaf
>
>
>
> > Thanks,
> > Rajan
> >
> > On Mon, Jun 19, 2023 at 5:45 AM Asaf Mesika <asaf.mes...@gmail.com>
> wrote:
> >
> > > First, let me add some data that should be added to the Background
> > section
> > > of the PIP since I had to reverse engineer the code to understand that,
> > > which is the opposite of the goal of a design document.
> > >
> > > ----
> > > Pulsar Broker has a binary protocol, which allows the client to consume
> > > messages, acknowledge them, and much more. The protocol comprises
> > Commands
> > > containing the data needed to apply that Command on the broker side.
> Many
> > > commands allow a consumer (client) to acknowledge messages, among them:
> > > CommandSendReceipt, CommandSend, CommandAck, and more. All those
> commands
> > > use the message type MessageIdData to specify the details of the
> message
> > to
> > > acknowledge.
> > >
> > > Here's what this data structure looks like:
> > > message MessageIdData {
> > > required uint64 ledgerId = 1;
> > > required uint64 entryId = 2;
> > > optional int32 partition = 3 [default = -1];
> > > optional int32 batch_index = 4 [default = -1];
> > > repeated int64 ack_set = 5;
> > > optional int32 batch_size = 6;
> > >
> > > // For the chunk message id, we need to specify the first chunk message
> > id.
> > > optional MessageIdData first_chunk_message_id = 7;
> > > }
> > >
> > > The key fields are the ledgerID at which the message is contained and
> > > entryId, which indicates the offset inside the ledger (message number).
> > >
> > > The client uses a class named MessageIdData which is the auto-generated
> > > code representing the message MessageIdData.
> > > ---------
> > >
> > > Now, in the design, you wrote:
> > >
> > > > Thefore, we need to add topic-name into MessageIdData and allow
> > > > multi-topic/partitioned topic to deserialize message correctly so,
> API
> > > like
> > > > acknowledge can perform as expected.
> > >
> > >
> > > So you say in that sentence that you will add the topic name into
> > > MessageIdData.
> > > MessageIdData is defined in PulsarApi.proto and is transferred over the
> > > wire, so how can you add the topic to this class without changing the
> > wire
> > > protocol?
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jun 16, 2023 at 10:47 PM Rajan Dhabalia <rdhaba...@apache.org>
> > > wrote:
> > >
> > > > Yes, the topic name will not be transferred and it's not part of the
> > wire
> > > > protocol. Message uses MessageID protobuf data-structure to serialize
> > and
> > > > deserialize MessageId and it doesn't change any behavior nor will
> > > transfer
> > > > any additional fields to the broker. and I would not like to
> introduce
> > > any
> > > > additional data-structure as that will create data copy, field
> > > > inconsistencies, and more garbage due to more object allocation and
> > > that's
> > > > something we would like to avoid.
> > > >
> > > > Thanks,
> > > > Rajan
> > > >
> > > > On Mon, May 15, 2023 at 6:18 AM PengHui Li <peng...@apache.org>
> wrote:
> > > >
> > > > > I think the topic name will not be transmitted to the broker.
> > > > > The client side used the class generated by the protobuf message.
> > > > > Or, we can create another class to avoid coupling issues, but it
> > > > > will introduce more changes and copy data from one structure
> > > > > to another. For the long-term, I think it should be a good way if
> > > > > we don't have blockers with this solution. Because I don't think
> > > > > there is a higher priority in the long run than keeping the
> protocol
> > > > clear.
> > > > >
> > > > > If the above options are not feasible. At least, we should clarify
> > > > > it in the proposal and add comments in the proto file to avoid
> > > > > other clients transmitting the topic name to the broker.
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Fri, May 12, 2023 at 5:59 PM Asaf Mesika <asaf.mes...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > I don't get it - you say msgId is a data structure contained
> within
> > > > > > MessageId implementation, right? I presume msgId is the data
> > > structure
> > > > > the
> > > > > > client transmit to the server, so that means you are transmitting
> > > topic
> > > > > to
> > > > > > the server?
> > > > > >
> > > > > >
> > > > > > On Fri, May 12, 2023 at 7:45 AM Rajan Dhabalia <
> > rdhaba...@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thank you for sharing your knowledge about the PIP which should
> > be
> > > > > > created
> > > > > > > before PR and I think everyone in the community knows about it.
> > but
> > > > you
> > > > > > can
> > > > > > > check the PR for context which was blocked for sometime and we
> > > > decided
> > > > > to
> > > > > > > create PIP with proto changes.
> > > > > > >
> > > > > > > This PIP/PR tries to fix the issue where partitioned topic
> fails
> > > > while
> > > > > > > acking deserialized messageId. topic name will be part of
> > MsgIdData
> > > > > which
> > > > > > > is the data-structure used by messageID to store msgID context
> > > along
> > > > > with
> > > > > > > partition, batching, and other metadata. topic name will be
> > > attached
> > > > > only
> > > > > > > when the user tries to serialize and deserialize the messageId
> > > which
> > > > > will
> > > > > > > be purely client side implementation and in other cases it will
> > not
> > > > be
> > > > > > > transmitted to server. Also, partitioned topic's abstract
> concept
> > > for
> > > > > > user
> > > > > > > and messageID must be also remain abstract for users and users
> > must
> > > > not
> > > > > > > know about different implementation of messageId and our goal
> is
> > to
> > > > > > > maintain that abstraction without telling user about
> > MessageIdImpl
> > > or
> > > > > > > TopicMessageIdImpl.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Rajan
> > > > > > >
> > > > > > >
> > > > > > > On Thu, May 11, 2023 at 7:29 AM Asaf Mesika <
> > asaf.mes...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Rajan,
> > > > > > > >
> > > > > > > > A few comments on the PIP as I couldn't understand it fully
> as
> > > some
> > > > > > > pieces
> > > > > > > > of information is missing.
> > > > > > > >
> > > > > > > > First, I would like to remind about the rules, that exists in
> > the
> > > > > > > beginning
> > > > > > > > of the PIP template:
> > > > > > > >
> > > > > > > > <!--
> > > > > > > > RULES
> > > > > > > > * Never place a link to an external site like Google Doc. The
> > > > > proposal
> > > > > > > > should be in this issue entirely.
> > > > > > > > * Use a spelling and grammar checker tools if available for
> you
> > > > > (there
> > > > > > > are
> > > > > > > > plenty of free ones)
> > > > > > > >
> > > > > > > > PROPOSAL HEALTH CHECK
> > > > > > > > I can read the design document and understand the problem
> > > statement
> > > > > and
> > > > > > > > what you plan to change *without* resorting to a couple of
> > hours
> > > of
> > > > > > code
> > > > > > > > reading just to start having a high level understanding of
> the
> > > > > change.
> > > > > > > > -->
> > > > > > > >
> > > > > > > >
> > > > > > > > In this specific case
> > > > > > > > 1. I would include explanation and detail the data structures
> > > > fields
> > > > > of
> > > > > > > > objects you mentioned, such as: MessageIdImpl and
> > MessageIdData.
> > > > > > > > 2. I would not put a PR as the design section, so I need to
> > read
> > > > code
> > > > > > to
> > > > > > > > understand what the exact solution details.
> > > > > > > >
> > > > > > > > You wrote:
> > > > > > > >
> > > > > > > > > Pulsar api provides MessageId
> > > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/MessageId.java
> > > > > > > >
> > > > > > > > interface
> > > > > > > > > which is generally used by producer and consumer
> applications
> > > to
> > > > > > manage
> > > > > > > > > topic offset.
> > > > > > > >
> > > > > > > >
> > > > > > > > I think it's used to allow consumers to acknowledge (can be
> per
> > > > > > message)
> > > > > > > so
> > > > > > > > offset if wrong terminology here.
> > > > > > > > For producers, not sure exactly its usage. Maybe if they need
> > to
> > > > > refer
> > > > > > to
> > > > > > > > this message later when reading by Reader interface.
> > > > > > > > I would correct this section.
> > > > > > > >
> > > > > > > > However, right now Pulsar doesn't support correct
> > deserialization
> > > > of
> > > > > > > > > multi-topic or partitioned-topic because of that
> > 1acknowledge`
> > > > API
> > > > > > call
> > > > > > > > > fails for those topics with below error
> > > > > > > >
> > > > > > > >
> > > > > > > > You're saying that the acknowledgement API method signature
> > > > receives
> > > > > > > > MessageId, but do not receive TopicMessageId?
> > > > > > > >
> > > > > > > > I have a few questions on that:
> > > > > > > >
> > > > > > > > 1. The acknowledgement API is part of Pulsar binary protocol.
> > Is
> > > > your
> > > > > > > plan
> > > > > > > > to alter that protocol so it will also include the topic
> field
> > as
> > > > > part
> > > > > > of
> > > > > > > > the message ID?
> > > > > > > >
> > > > > > > > 2. I think your PIP needs to explain the following items
> which
> > > are
> > > > > > > missing
> > > > > > > > as context:
> > > > > > > > - There are two implementation for MessageId interface, one
> for
> > > > topic
> > > > > > and
> > > > > > > > one for partitioned topic.
> > > > > > > > - The problem is that the serialization/desrialization method
> > is
> > > > used
> > > > > > > > mainly for translating the ID into the binary protocol, which
> > > only
> > > > > > > requires
> > > > > > > > the ID (ledger ID, entry ID).
> > > > > > > > - The reason for that is that once you created a consumer, it
> > > has a
> > > > > > topic
> > > > > > > > attached to it. Transferring the topic for the ack is
> > redundant.
> > > > > > > >
> > > > > > > > All of this needs to be in the background.
> > > > > > > >
> > > > > > > > I have several ideas on solving that, which IMO should mainly
> > be
> > > in
> > > > > the
> > > > > > > > client level, but I must get answers to the questions above
> > > before
> > > > I
> > > > > > can
> > > > > > > > continue.
> > > > > > > >
> > > > > > > > Last note
> > > > > > > > You have basically placed a link to a pull request as the
> > design
> > > > > > solution
> > > > > > > > (high-level/detailed design).
> > > > > > > > The whole idea of the design is that you describe the
> solution
> > > > > without
> > > > > > > > resorting to code.
> > > > > > > > IMO you should amend the design, state the goal shortly, and
> > > have a
> > > > > > high
> > > > > > > > level design section which contains 1-2 short paragraphs
> > > describing
> > > > > > > exactly
> > > > > > > > your solution.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Asaf
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, May 9, 2023 at 3:24 PM Yunze Xu <x...@apache.org>
> > wrote:
> > > > > > > >
> > > > > > > > > I'm talking about whether to add a new separate API. I'm
> > > > concerned
> > > > > > > > > about whether existing applications would be affected, no
> > > matter
> > > > if
> > > > > > > > > the existing implementation has the limitation. If yes, we
> > > should
> > > > > > > > > document it in the PIP so that users can know that.
> > > > > > > > >
> > > > > > > > > > it's a new optional field which would not break the
> > > > compatibility
> > > > > > > > >
> > > > > > > > > And yes, I just confirmed it with simple demos in my local
> > env.
> > > > So
> > > > > > I'm
> > > > > > > > > +1 to this proposal.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yunze
> > > > > > > > >
> > > > > > > > > On Tue, May 9, 2023 at 3:05 PM Rajan Dhabalia <
> > > > > rdhaba...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Weill there are multiple things: it's a new optional
> field
> > > > which
> > > > > > > would
> > > > > > > > > not
> > > > > > > > > > break the compatibility , also current messaegId
> > > serialization
> > > > > and
> > > > > > > > > > deserialization anyway only impact multi-topic consumer
> > which
> > > > is
> > > > > > > > already
> > > > > > > > > > broken or has the limitation and, adding a new separate
> API
> > > for
> > > > > > > > > partitioned
> > > > > > > > > > topic is not only not acceptable but creates too much
> > > confusion
> > > > > for
> > > > > > > > users
> > > > > > > > > > to use separate ack APIs for non-partition and partition
> > > topics
> > > > > and
> > > > > > > > that
> > > > > > > > > > also breaks partitioned topic abstraction which we would
> > like
> > > > to
> > > > > > > avoid.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Rajan
> > > > > > > > > >
> > > > > > > > > > On Mon, May 8, 2023 at 11:27 PM Yunze Xu <x...@apache.org
> >
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > It seems that `TopicMessageIdImpl#toByteArray` now
> could
> > > > > > serialize
> > > > > > > > the
> > > > > > > > > > > optional topic field to the bytes. I didn't test it now
> > > but I
> > > > > > have
> > > > > > > a
> > > > > > > > > > > concern about whether it would bring a breaking change.
> > > > > > > > > > >
> > > > > > > > > > > Assuming there are two applications (let's say A and B)
> > > based
> > > > > on
> > > > > > an
> > > > > > > > > > > older Pulsar client, A writes serialized bytes into a
> > > file, B
> > > > > > reads
> > > > > > > > > > > bytes from the file and parses it to a MessageId. If A
> > > > upgraded
> > > > > > its
> > > > > > > > > > > Pulsar client to the latest while B did not, what would
> > > > happen?
> > > > > > > Could
> > > > > > > > > > > B still get the correct MessageId or the bytes would
> not
> > be
> > > > > able
> > > > > > to
> > > > > > > > > > > parsed?
> > > > > > > > > > >
> > > > > > > > > > > P.S. it's better to add the API changes and potential
> > > > breaking
> > > > > > > > changes
> > > > > > > > > > > in the proposal.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Yunze
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 9, 2023 at 1:59 PM Rajan Dhabalia <
> > > > > > > rdhaba...@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > Pulsar api provides MessageId interface which is
> > > generally
> > > > > used
> > > > > > > by
> > > > > > > > > > > producer
> > > > > > > > > > > > and consumer applications to manage topic offset.
> > > > Sometimes,
> > > > > > > these
> > > > > > > > > > > > applications would like to serialize and deserialize
> > > > > > messageIds,
> > > > > > > > > > > > specifically consumer app which would like to persist
> > > > > messageId
> > > > > > > and
> > > > > > > > > ack
> > > > > > > > > > > > with those messageIds by deserializing them. However,
> > > right
> > > > > now
> > > > > > > > > Pulsar
> > > > > > > > > > > > doesn't support correct deserialization of
> multi-topic
> > or
> > > > > > > > > > > partitioned-topic
> > > > > > > > > > > > because of that 1acknowledge` API call fails for
> those
> > > > topics
> > > > > > > with
> > > > > > > > > below
> > > > > > > > > > > > error:
> > > > > > > > > > > > "Only TopicMessageId is allowed to acknowledge for a
> > > > > > multi-topics
> > > > > > > > > > > consumer"
> > > > > > > > > > > >
> > > > > > > > > > > > MessageIdImpl stores id metadata into MessageIdData
> > which
> > > > > > doesn't
> > > > > > > > > contain
> > > > > > > > > > > > context about topic name to find out which topic
> > belongs
> > > to
> > > > > > this
> > > > > > > > > > > MessageID.
> > > > > > > > > > > > Therefore, we need to add topic-name into
> MessageIdData
> > > and
> > > > > > allow
> > > > > > > > > > > > multi-topic/partitioned topics to deserialize
> messages
> > > > > > correctly
> > > > > > > > so,
> > > > > > > > > > > > consumer app can perform as expected.
> > > > > > > > > > > >
> > > > > > > > > > > > Please visit PIP for any suggestions:
> > > > > > > > > > > > https://github.com/apache/pulsar/issues/20221
> > > > > > > > > > > >
> > > > > > > > > > > > This PIP is created with PR:
> > > > > > > > > https://github.com/apache/pulsar/pull/19944
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Rajan
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to