Hi Jiuming

Firstly, I think the idea you provided is great.

It seems the field `beginPublishTimestamp` is also not needed, and there is
an existing field `ledgerInfo.timestamp` to use,
- The current ledger's timestamp can be used as `beginPublishTimestamp`
- The next ledger's timestamp can be used as `endPublishTimestamp`

Thanks
Yubiao Feng


On Fri, Mar 15, 2024 at 1:57 PM 太上玄元道君 <dao...@apache.org> wrote:

> Hi, Girish,
>
> Thanks for your feedback!
>
> In general, it's a very good suggestion, we can just use one single
> `beginPublishTimestamp` to achieve our goal,
> but the actual problem will be a bit more complex.
>
> Actually, the naming of `beginPublishTimestamp` and `endPublishTimestamp`
> has a little problem,
> it should be `minPublishTimestamp` and `maxPublishTimestamp`.
>
> In some cases, next ledger's `minPublishTimestamp` may less than it's
> previous ledger's `maxPublishTimestamp`,
> so we have to maintain both `minPublishTimestamp` and
> `maxPublishTimestamp`.
>
> Say, there are 2 producers publishing to the topic, Producer1 send
> *message1* to the topic, broker received
> *message1* immediately and persist it to the ledger. Producer2 send
> *message2* to the broker *before* *message1*,
> but for some reason, broker received *message2* after a while.
> At the same time, Ledger switching happens, the previous ledger's
> `maxPublishTimestamp` is *message1*'s publishTimestamp,
> and the current ledger's `minPublishTimestamp` is *message2*'s
> publishTimestamp,
> so the current ledger's `minPublishTimestamp` is less than the previous
> ledger's `maxPublishTimestamp`, right?
>
> If we just have a single field  `minPublishTimestamp`, it will have a
> hidden meaning: the next ledger's `minPublishTimestamp`
> is it's previous ledger's `maxPublishTimestamp`, it's incorrect.
> So we want to introduce `minPublishTimestamp` and `maxPublishTimestamp` to
> make it clear.
>
> Thanks,
> Tao Jiuming
>
> Girish Sharma <scrapmachi...@gmail.com> 于2024年3月15日周五 12:14写道:
>
> > One suggestion, I think you can make do with storing just begin
> timestamp.
> > Any search utilising these values will work the same way with just one of
> > those timestamps compared to both begin and end.
> >
> > Any particular reason you need both the timestamps?
> >
> > Regards
> >
> > On Fri, Mar 15, 2024, 9:39 AM 太上玄元道君 <dao...@apache.org> wrote:
> >
> > > bump
> > >
> > > 太上玄元道君 <dao...@apache.org>于2024年3月10日 周日06:41写道:
> > >
> > > > Hi Pulsar community,
> > > >
> > > > A new PIP is opened, this thread is to discuss PIP-345: Optimize
> > finding
> > > > message by timestamp.
> > > >
> > > > Motivation:
> > > > Finding message by timestamp is widely used in Pulsar:
> > > > * It is used by the `pulsar-admin` tool to get the message id by
> > > > timestamp, expire messages by timestamp, and reset cursor.
> > > > * It is used by the `pulsar-client` to reset the subscription to a
> > > > specific timestamp.
> > > > * And also used by the `expiry-monitor` to find the messages that are
> > > > expired.
> > > > Even though the current implementation is correct, and using binary
> > > search
> > > > to speed-up, but it's still not efficient *enough*.
> > > > The current implementation is to scan all the ledgers to find the
> > message
> > > > by timestamp.
> > > > This is a performance bottleneck, especially for large topics with
> many
> > > > messages.
> > > > Say, if there is a topic which has 1m entries, through the binary
> > search,
> > > > it will take 20 iterations to find the message.
> > > > In some extreme cases, it may lead to a timeout, and the client will
> > not
> > > > be able to seeking by timestamp.
> > > >
> > > > PIP: https://github.com/apache/pulsar/pull/22234
> > > >
> > > > Your feedback is very important to us, please take a moment to review
> > the
> > > > proposal and provide your thoughts.
> > > >
> > > > Thanks,
> > > > Tao Jiuming
> > > >
> > >
> >
>

Reply via email to