Hi Jiuming Firstly, I think the idea you provided is great.
It seems the field `beginPublishTimestamp` is also not needed, and there is an existing field `ledgerInfo.timestamp` to use, - The current ledger's timestamp can be used as `beginPublishTimestamp` - The next ledger's timestamp can be used as `endPublishTimestamp` Thanks Yubiao Feng On Fri, Mar 15, 2024 at 1:57 PM 太上玄元道君 <dao...@apache.org> wrote: > Hi, Girish, > > Thanks for your feedback! > > In general, it's a very good suggestion, we can just use one single > `beginPublishTimestamp` to achieve our goal, > but the actual problem will be a bit more complex. > > Actually, the naming of `beginPublishTimestamp` and `endPublishTimestamp` > has a little problem, > it should be `minPublishTimestamp` and `maxPublishTimestamp`. > > In some cases, next ledger's `minPublishTimestamp` may less than it's > previous ledger's `maxPublishTimestamp`, > so we have to maintain both `minPublishTimestamp` and > `maxPublishTimestamp`. > > Say, there are 2 producers publishing to the topic, Producer1 send > *message1* to the topic, broker received > *message1* immediately and persist it to the ledger. Producer2 send > *message2* to the broker *before* *message1*, > but for some reason, broker received *message2* after a while. > At the same time, Ledger switching happens, the previous ledger's > `maxPublishTimestamp` is *message1*'s publishTimestamp, > and the current ledger's `minPublishTimestamp` is *message2*'s > publishTimestamp, > so the current ledger's `minPublishTimestamp` is less than the previous > ledger's `maxPublishTimestamp`, right? > > If we just have a single field `minPublishTimestamp`, it will have a > hidden meaning: the next ledger's `minPublishTimestamp` > is it's previous ledger's `maxPublishTimestamp`, it's incorrect. > So we want to introduce `minPublishTimestamp` and `maxPublishTimestamp` to > make it clear. > > Thanks, > Tao Jiuming > > Girish Sharma <scrapmachi...@gmail.com> 于2024年3月15日周五 12:14写道: > > > One suggestion, I think you can make do with storing just begin > timestamp. > > Any search utilising these values will work the same way with just one of > > those timestamps compared to both begin and end. > > > > Any particular reason you need both the timestamps? > > > > Regards > > > > On Fri, Mar 15, 2024, 9:39 AM 太上玄元道君 <dao...@apache.org> wrote: > > > > > bump > > > > > > 太上玄元道君 <dao...@apache.org>于2024年3月10日 周日06:41写道: > > > > > > > Hi Pulsar community, > > > > > > > > A new PIP is opened, this thread is to discuss PIP-345: Optimize > > finding > > > > message by timestamp. > > > > > > > > Motivation: > > > > Finding message by timestamp is widely used in Pulsar: > > > > * It is used by the `pulsar-admin` tool to get the message id by > > > > timestamp, expire messages by timestamp, and reset cursor. > > > > * It is used by the `pulsar-client` to reset the subscription to a > > > > specific timestamp. > > > > * And also used by the `expiry-monitor` to find the messages that are > > > > expired. > > > > Even though the current implementation is correct, and using binary > > > search > > > > to speed-up, but it's still not efficient *enough*. > > > > The current implementation is to scan all the ledgers to find the > > message > > > > by timestamp. > > > > This is a performance bottleneck, especially for large topics with > many > > > > messages. > > > > Say, if there is a topic which has 1m entries, through the binary > > search, > > > > it will take 20 iterations to find the message. > > > > In some extreme cases, it may lead to a timeout, and the client will > > not > > > > be able to seeking by timestamp. > > > > > > > > PIP: https://github.com/apache/pulsar/pull/22234 > > > > > > > > Your feedback is very important to us, please take a moment to review > > the > > > > proposal and provide your thoughts. > > > > > > > > Thanks, > > > > Tao Jiuming > > > > > > > > > >