Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

Dave Fisher Mon, 11 Mar 2024 19:49:33 -0700

What can you say about the time spent to maintain these new ledger fields? I 
think you are asking to modify the main message logic which is highly 
optimized., but I’m not sure. Have you tried your code on your own hardware? Do 
you have performance comparisons of the normal flow?


> On Mar 11, 2024, at 7:41 PM, 太上玄元道君 <[email protected]> wrote:
> 
> bump
> 
> 太上玄元道君 <[email protected]>于2024年3月11日 周一17:55写道：
> 
>> bump
>> 
>> 太上玄元道君 <[email protected]> 于2024年3月10日周日 06:41写道：
>> 
>>> Hi Pulsar community,
>>> 
>>> A new PIP is opened, this thread is to discuss PIP-345: Optimize finding
>>> message by timestamp.
>>> 
>>> Motivation:
>>> Finding message by timestamp is widely used in Pulsar:
>>> * It is used by the `pulsar-admin` tool to get the message id by
>>> timestamp, expire messages by timestamp, and reset cursor.
>>> * It is used by the `pulsar-client` to reset the subscription to a
>>> specific timestamp.
>>> * And also used by the `expiry-monitor` to find the messages that are
>>> expired.
>>> Even though the current implementation is correct, and using binary
>>> search to speed-up, but it's still not efficient *enough*.
>>> The current implementation is to scan all the ledgers to find the message
>>> by timestamp.
>>> This is a performance bottleneck, especially for large topics with many
>>> messages.
>>> Say, if there is a topic which has 1m entries, through the binary search,
>>> it will take 20 iterations to find the message.
>>> In some extreme cases, it may lead to a timeout, and the client will not
>>> be able to seeking by timestamp.
>>> 
>>> PIP: https://github.com/apache/pulsar/pull/22234
>>> 
>>> Your feedback is very important to us, please take a moment to review the
>>> proposal and provide your thoughts.
>>> 
>>> Thanks,
>>> Tao Jiuming
>>> 
>>

Re: [DISCUSS] PIP-345: Optimize finding message by timestamp

Reply via email to