Re: [DISCUSSION] Redesign the MessageId interface

Haiting Jiang Tue, 08 Nov 2022 23:02:10 -0800

Overall, this makes sense to me.
The current status of MessageId is a bit messy, especially for client
developers and senior users who are interested in the implementation
details.
But please make sure we have to make it compatible with previous
implementations, like the `toString` method, I bet someone has already
done the parsing and got the ledgerId and entryId from it.



Thanks,
Haiting

On Tue, Nov 8, 2022 at 6:25 PM Yunze Xu <y...@streamnative.io.invalid> wrote:
>
> Hi all,
>
> Currently we have the following 5 implementations of MessageId:
>
> - MessageIdImpl: (ledger id, entry id, partition index)
>   - BatchMessageIdImpl: adds (batch index, batch size, acker), where
>     acker is a wrapper of a BitSet.
>   - ChunkMessageIdImpl: adds another MessageIdImpl that represents
>     the first MessageIdImpl of a BitSet.
>   - MultiMessageIdImpl: adds a map that maps the topic name to the
>     MessageId.
> - TopicMessageIdImpl: adds the topic name and the partition name
>
> These implementations are such a mess. For example, when users get a
> MessageId from `Producer#send`:
>
> ```java
> var id = producer.send("msg");
> ```
>
> There is no getter to get some specific fields like ledger id. You can
> only see a representation from `toString` method and got some output
> like "0:0:-1:0". Maybe this design is to hidden some details, but if
> users don't know the details like ledger id and entry id, how could
> you know what does "0:0:-1:0" mean? What if `MessageId#toString`'s
> implementation changed? Should it be treated as a breaking change?
>
> The original definition of the underlying MessageIdData is much more
> clear:
>
> ```proto
> message MessageIdData {
>     required uint64 ledgerId = 1;
>     required uint64 entryId  = 2;
>     optional int32 partition = 3 [default = -1];
>     optional int32 batch_index = 4 [default = -1];
>     repeated int64 ack_set = 5;
>     optional int32 batch_size = 6;
>
>     // For the chunk message id, we need to specify the first chunk message 
> id.
>     optional MessageIdData first_chunk_message_id = 7;
> }
> ```
>
> IMO, MessageId should be a wrapper of MessageIdData. It's more natural
> to have an interface like:
>
> ```java
> interface MessageId {
>     long ledgerId();
>     long entryId();
>     Optional<Integer> partition();
>     Optional<Integer> batchIndex();
>     // ...
> ```
>
> Additionally, there are many places that use only the triple of
> (ledger id, entry id, batch index) as the key to represent the position.
> Currently, they are done by adding a conversion from
> BatchMessageIdImpl to MessageIdImpl. However, it's more intuitive to
> write something like:
>
> ```java
> class MessageIdPosition implements Comparable<MessageIdPosition> {
>     private final MessageId messageId;
>     // TODO: compare only the triple (ledger, entry, batch)
> ```
>
> Therefore, I'm going to write a proposal to redesign the MessageId
> interface only by adding some getters. Regarding the 5 existing
> implementations, I think we can drop them because they are a part
> of `pulsar-client`, not `pulsar-client-api`.
>
> Please feel free to share your points.
>
> Thanks,
> Yunze

Re: [DISCUSSION] Redesign the MessageId interface

Reply via email to