Hi Dave, Thanks for your review! Perhaps it's because I wrote more detailed steps, but the key points is: 1. Deserialize MessageMetadata once broker received message 2. Pass MessageMetadata to `PublishContext` 3. After add entries finished, get `publishTimestamp` from `PublishContext#messageMetadata`, and update `beginPublishTimestamp`,`endPublishTimestamp` of the `Ledger`
Since we might deserialize MessageMetadata in message publishing in the broker side(PersistentTopic#isExceedMaximumDeliveryDelay, MessageDeduplication), deserialize MessageMetadata once broker received message will help to reduce the number of MessageMetadata deserializing in some cases. About maintain these new ledger fields, it just like ``` public class ManagedLedgerImpl { // New field // Add a map to record the begin/end publish timestamp of the ledger private final NavigableMap<Long, MutablePair</* begin publish timestamp*/Long, /* end publish timestamp*/Long>> publishTimestamps = new ConcurrentSkipListMap<>(); // Update the begin/end publish timestamp of the ledger after the entry is added to the ledger. // New method protected void updatePublishTimestamp(long ledgerId, long publishTimestamp) { MutablePair<Long, Long> pair = publishTimestamps.computeIfAbsent(ledgerId, k -> new MutablePair<>(Long.MAX_VALUE, Long.MIN_VALUE)); pair.setLeft(Math.min(pair.getLeft(), publishTimestamp)); pair.setRight(Math.max(pair.getRight(), publishTimestamp)); } } ``` I just use a Map to maintain it, when closing Ledger, set `beginPublishTimestamp`,`endPublishTimestamp` to `LedgerInfo`. Besides, no additional expenses were introduced. So, if you are asking about `the time spent`, I would say, *nearly* zero. Thanks, Tao Jiuming Dave Fisher <w...@apache.org> 于2024年3月12日周二 10:50写道: > What can you say about the time spent to maintain these new ledger fields? > I think you are asking to modify the main message logic which is highly > optimized., but I’m not sure. Have you tried your code on your own > hardware? Do you have performance comparisons of the normal flow? > > > On Mar 11, 2024, at 7:41 PM, 太上玄元道君 <dao...@apache.org> wrote: > > > > bump > > > > 太上玄元道君 <dao...@apache.org>于2024年3月11日 周一17:55写道: > > > >> bump > >> > >> 太上玄元道君 <dao...@apache.org> 于2024年3月10日周日 06:41写道: > >> > >>> Hi Pulsar community, > >>> > >>> A new PIP is opened, this thread is to discuss PIP-345: Optimize > finding > >>> message by timestamp. > >>> > >>> Motivation: > >>> Finding message by timestamp is widely used in Pulsar: > >>> * It is used by the `pulsar-admin` tool to get the message id by > >>> timestamp, expire messages by timestamp, and reset cursor. > >>> * It is used by the `pulsar-client` to reset the subscription to a > >>> specific timestamp. > >>> * And also used by the `expiry-monitor` to find the messages that are > >>> expired. > >>> Even though the current implementation is correct, and using binary > >>> search to speed-up, but it's still not efficient *enough*. > >>> The current implementation is to scan all the ledgers to find the > message > >>> by timestamp. > >>> This is a performance bottleneck, especially for large topics with many > >>> messages. > >>> Say, if there is a topic which has 1m entries, through the binary > search, > >>> it will take 20 iterations to find the message. > >>> In some extreme cases, it may lead to a timeout, and the client will > not > >>> be able to seeking by timestamp. > >>> > >>> PIP: https://github.com/apache/pulsar/pull/22234 > >>> > >>> Your feedback is very important to us, please take a moment to review > the > >>> proposal and provide your thoughts. > >>> > >>> Thanks, > >>> Tao Jiuming > >>> > >> > >