Il giorno mar 8 nov 2022 alle ore 11:52 Yunze Xu <y...@streamnative.io.invalid> ha scritto: > > Hi Enrico, > > > We also need a way to represent this as a String or a byte[] > > We already have the `toByteArray` method, right?
Yes, correct. So we are fine. I forgot about it and I answered too quickly. I am not sure if this can be in the scope of this initiative, but we should somehow get rid of stuff like "fromByteArrayWithTopic" vs "fromByteArray". Thanks Enrico > > Thanks, > Yunze > > On Tue, Nov 8, 2022 at 6:43 PM Enrico Olivelli <eolive...@gmail.com> wrote: > > > > Il giorno mar 8 nov 2022 alle ore 11:25 Yunze Xu > > <y...@streamnative.io.invalid> ha scritto: > > > > > > Hi all, > > > > > > Currently we have the following 5 implementations of MessageId: > > > > > > - MessageIdImpl: (ledger id, entry id, partition index) > > > - BatchMessageIdImpl: adds (batch index, batch size, acker), where > > > acker is a wrapper of a BitSet. > > > - ChunkMessageIdImpl: adds another MessageIdImpl that represents > > > the first MessageIdImpl of a BitSet. > > > - MultiMessageIdImpl: adds a map that maps the topic name to the > > > MessageId. > > > - TopicMessageIdImpl: adds the topic name and the partition name > > > > > > These implementations are such a mess. For example, when users get a > > > MessageId from `Producer#send`: > > > > > > ```java > > > var id = producer.send("msg"); > > > ``` > > > > > > There is no getter to get some specific fields like ledger id. You can > > > only see a representation from `toString` method and got some output > > > like "0:0:-1:0". Maybe this design is to hidden some details, but if > > > users don't know the details like ledger id and entry id, how could > > > you know what does "0:0:-1:0" mean? What if `MessageId#toString`'s > > > implementation changed? Should it be treated as a breaking change? > > > > > > The original definition of the underlying MessageIdData is much more > > > clear: > > > > > > ```proto > > > message MessageIdData { > > > required uint64 ledgerId = 1; > > > required uint64 entryId = 2; > > > optional int32 partition = 3 [default = -1]; > > > optional int32 batch_index = 4 [default = -1]; > > > repeated int64 ack_set = 5; > > > optional int32 batch_size = 6; > > > > > > // For the chunk message id, we need to specify the first chunk > > > message id. > > > optional MessageIdData first_chunk_message_id = 7; > > > } > > > ``` > > > > > > IMO, MessageId should be a wrapper of MessageIdData. It's more natural > > > to have an interface like: > > > > > > ```java > > > interface MessageId { > > > long ledgerId(); > > > long entryId(); > > > Optional<Integer> partition(); > > > Optional<Integer> batchIndex(); > > > // ... > > > ``` > > > > This is very good for client applications. > > We also need a way to represent this as a String or a byte[], this way > > client applications have a standard way to store > > message offsets into an external system (for instance when you want to > > user the Reader API and keep track of the position by yourself) > > > > Enrico > > > > > > > > Additionally, there are many places that use only the triple of > > > (ledger id, entry id, batch index) as the key to represent the position. > > > Currently, they are done by adding a conversion from > > > BatchMessageIdImpl to MessageIdImpl. However, it's more intuitive to > > > write something like: > > > > > > ```java > > > class MessageIdPosition implements Comparable<MessageIdPosition> { > > > private final MessageId messageId; > > > // TODO: compare only the triple (ledger, entry, batch) > > > ``` > > > > > > Therefore, I'm going to write a proposal to redesign the MessageId > > > interface only by adding some getters. Regarding the 5 existing > > > implementations, I think we can drop them because they are a part > > > of `pulsar-client`, not `pulsar-client-api`. > > > > > > Please feel free to share your points. > > > > > > Thanks, > > > Yunze