Re: [DISCUSSION] Redesign the MessageId interface

Joe F Tue, 08 Nov 2022 07:16:25 -0800

>Maybe this design is to hidden some details, but if
users don't know the details like ledger id and entry id, how could
you know what does "0:0:-1:0" mean?


 Abstractions exist for a reason. Ledgerid and entryid are  implementation
details, and an application should not be interpreting that at all.
-j


On Tue, Nov 8, 2022 at 3:43 AM Yunze Xu <[email protected]>
wrote:

> I didn't look into these two methods at the moment. But I think it's
> possible to
> retain only the `fromByteArray`.
>
> Thanks,
> Yunze
>
> On Tue, Nov 8, 2022 at 7:02 PM Enrico Olivelli <[email protected]>
> wrote:
> >
> > Il giorno mar 8 nov 2022 alle ore 11:52 Yunze Xu
> > <[email protected]> ha scritto:
> > >
> > > Hi Enrico,
> > >
> > > > We also need a way to represent this as a String or a byte[]
> > >
> > > We already have the `toByteArray` method, right?
> >
> > Yes, correct. So we are fine. I forgot about it and I answered too
> quickly.
> >
> > I am not sure if this can be in the scope of this initiative, but we
> > should somehow get rid of
> > stuff like "fromByteArrayWithTopic" vs "fromByteArray".
> >
> > Thanks
> > Enrico
> >
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Tue, Nov 8, 2022 at 6:43 PM Enrico Olivelli <[email protected]>
> wrote:
> > > >
> > > > Il giorno mar 8 nov 2022 alle ore 11:25 Yunze Xu
> > > > <[email protected]> ha scritto:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Currently we have the following 5 implementations of MessageId:
> > > > >
> > > > > - MessageIdImpl: (ledger id, entry id, partition index)
> > > > >   - BatchMessageIdImpl: adds (batch index, batch size, acker),
> where
> > > > >     acker is a wrapper of a BitSet.
> > > > >   - ChunkMessageIdImpl: adds another MessageIdImpl that represents
> > > > >     the first MessageIdImpl of a BitSet.
> > > > >   - MultiMessageIdImpl: adds a map that maps the topic name to the
> > > > >     MessageId.
> > > > > - TopicMessageIdImpl: adds the topic name and the partition name
> > > > >
> > > > > These implementations are such a mess. For example, when users get
> a
> > > > > MessageId from `Producer#send`:
> > > > >
> > > > > ```java
> > > > > var id = producer.send("msg");
> > > > > ```
> > > > >
> > > > > There is no getter to get some specific fields like ledger id. You
> can
> > > > > only see a representation from `toString` method and got some
> output
> > > > > like "0:0:-1:0". Maybe this design is to hidden some details, but
> if
> > > > > users don't know the details like ledger id and entry id, how could
> > > > > you know what does "0:0:-1:0" mean? What if `MessageId#toString`'s
> > > > > implementation changed? Should it be treated as a breaking change?
> > > > >
> > > > > The original definition of the underlying MessageIdData is much
> more
> > > > > clear:
> > > > >
> > > > > ```proto
> > > > > message MessageIdData {
> > > > >     required uint64 ledgerId = 1;
> > > > >     required uint64 entryId  = 2;
> > > > >     optional int32 partition = 3 [default = -1];
> > > > >     optional int32 batch_index = 4 [default = -1];
> > > > >     repeated int64 ack_set = 5;
> > > > >     optional int32 batch_size = 6;
> > > > >
> > > > >     // For the chunk message id, we need to specify the first
> chunk message id.
> > > > >     optional MessageIdData first_chunk_message_id = 7;
> > > > > }
> > > > > ```
> > > > >
> > > > > IMO, MessageId should be a wrapper of MessageIdData. It's more
> natural
> > > > > to have an interface like:
> > > > >
> > > > > ```java
> > > > > interface MessageId {
> > > > >     long ledgerId();
> > > > >     long entryId();
> > > > >     Optional<Integer> partition();
> > > > >     Optional<Integer> batchIndex();
> > > > >     // ...
> > > > > ```
> > > >
> > > > This is very good for client applications.
> > > > We also need a way to represent this as a String or a byte[], this
> way
> > > > client applications have a standard way to store
> > > > message offsets into an external system (for instance when you want
> to
> > > > user the Reader API and keep track of the position by yourself)
> > > >
> > > > Enrico
> > > >
> > > > >
> > > > > Additionally, there are many places that use only the triple of
> > > > > (ledger id, entry id, batch index) as the key to represent the
> position.
> > > > > Currently, they are done by adding a conversion from
> > > > > BatchMessageIdImpl to MessageIdImpl. However, it's more intuitive
> to
> > > > > write something like:
> > > > >
> > > > > ```java
> > > > > class MessageIdPosition implements Comparable<MessageIdPosition> {
> > > > >     private final MessageId messageId;
> > > > >     // TODO: compare only the triple (ledger, entry, batch)
> > > > > ```
> > > > >
> > > > > Therefore, I'm going to write a proposal to redesign the MessageId
> > > > > interface only by adding some getters. Regarding the 5 existing
> > > > > implementations, I think we can drop them because they are a part
> > > > > of `pulsar-client`, not `pulsar-client-api`.
> > > > >
> > > > > Please feel free to share your points.
> > > > >
> > > > > Thanks,
> > > > > Yunze
>

Re: [DISCUSSION] Redesign the MessageId interface

Reply via email to