Hi Yubiao, Thanks for your review!
1. As I mentioned in the *Alternatives* part of the PIP `LedgerInfo#timestamp` is broker's timestamp, finding message by timestamp is client's timestamp(publishTimestamp). We have to consider about client's clock doesn't sync to broker's clock, and we are finding a message by publishTimestamp, but pre-filter by broker's timestamp, it doesn't make sense. 2. Why do we need both `minPublishTimestamp` and `maxPublishTimestamp` As the email I replied to Girish, the next ledger's `minPublishTimestamp` might be less than it's previous ledger's `maxPublishTimestamp`, so we can't use next ledger's `minPublishTimestamp` as it's previous ledger's `maxPublishTimestamp`, I want to make it clear. Thanks, Tao Jiuming Yubiao Feng <yubiao.f...@streamnative.io.invalid> 于2024年3月16日周六 20:29写道: > Hi Jiuming > > Firstly, I think the idea you provided is great. > > It seems the field `beginPublishTimestamp` is also not needed, and there is > an existing field `ledgerInfo.timestamp` to use, > - The current ledger's timestamp can be used as `beginPublishTimestamp` > - The next ledger's timestamp can be used as `endPublishTimestamp` > > Thanks > Yubiao Feng > > > On Fri, Mar 15, 2024 at 1:57 PM 太上玄元道君 <dao...@apache.org> wrote: > > > Hi, Girish, > > > > Thanks for your feedback! > > > > In general, it's a very good suggestion, we can just use one single > > `beginPublishTimestamp` to achieve our goal, > > but the actual problem will be a bit more complex. > > > > Actually, the naming of `beginPublishTimestamp` and `endPublishTimestamp` > > has a little problem, > > it should be `minPublishTimestamp` and `maxPublishTimestamp`. > > > > In some cases, next ledger's `minPublishTimestamp` may less than it's > > previous ledger's `maxPublishTimestamp`, > > so we have to maintain both `minPublishTimestamp` and > > `maxPublishTimestamp`. > > > > Say, there are 2 producers publishing to the topic, Producer1 send > > *message1* to the topic, broker received > > *message1* immediately and persist it to the ledger. Producer2 send > > *message2* to the broker *before* *message1*, > > but for some reason, broker received *message2* after a while. > > At the same time, Ledger switching happens, the previous ledger's > > `maxPublishTimestamp` is *message1*'s publishTimestamp, > > and the current ledger's `minPublishTimestamp` is *message2*'s > > publishTimestamp, > > so the current ledger's `minPublishTimestamp` is less than the previous > > ledger's `maxPublishTimestamp`, right? > > > > If we just have a single field `minPublishTimestamp`, it will have a > > hidden meaning: the next ledger's `minPublishTimestamp` > > is it's previous ledger's `maxPublishTimestamp`, it's incorrect. > > So we want to introduce `minPublishTimestamp` and `maxPublishTimestamp` > to > > make it clear. > > > > Thanks, > > Tao Jiuming > > > > Girish Sharma <scrapmachi...@gmail.com> 于2024年3月15日周五 12:14写道: > > > > > One suggestion, I think you can make do with storing just begin > > timestamp. > > > Any search utilising these values will work the same way with just one > of > > > those timestamps compared to both begin and end. > > > > > > Any particular reason you need both the timestamps? > > > > > > Regards > > > > > > On Fri, Mar 15, 2024, 9:39 AM 太上玄元道君 <dao...@apache.org> wrote: > > > > > > > bump > > > > > > > > 太上玄元道君 <dao...@apache.org>于2024年3月10日 周日06:41写道: > > > > > > > > > Hi Pulsar community, > > > > > > > > > > A new PIP is opened, this thread is to discuss PIP-345: Optimize > > > finding > > > > > message by timestamp. > > > > > > > > > > Motivation: > > > > > Finding message by timestamp is widely used in Pulsar: > > > > > * It is used by the `pulsar-admin` tool to get the message id by > > > > > timestamp, expire messages by timestamp, and reset cursor. > > > > > * It is used by the `pulsar-client` to reset the subscription to a > > > > > specific timestamp. > > > > > * And also used by the `expiry-monitor` to find the messages that > are > > > > > expired. > > > > > Even though the current implementation is correct, and using binary > > > > search > > > > > to speed-up, but it's still not efficient *enough*. > > > > > The current implementation is to scan all the ledgers to find the > > > message > > > > > by timestamp. > > > > > This is a performance bottleneck, especially for large topics with > > many > > > > > messages. > > > > > Say, if there is a topic which has 1m entries, through the binary > > > search, > > > > > it will take 20 iterations to find the message. > > > > > In some extreme cases, it may lead to a timeout, and the client > will > > > not > > > > > be able to seeking by timestamp. > > > > > > > > > > PIP: https://github.com/apache/pulsar/pull/22234 > > > > > > > > > > Your feedback is very important to us, please take a moment to > review > > > the > > > > > proposal and provide your thoughts. > > > > > > > > > > Thanks, > > > > > Tao Jiuming > > > > > > > > > > > > > > >