Hi Pulsar community,

A new PIP is opened, this thread is to discuss PIP-345: Optimize finding
message by timestamp.

Motivation:
Finding message by timestamp is widely used in Pulsar:
* It is used by the `pulsar-admin` tool to get the message id by
timestamp, expire messages by timestamp, and reset cursor.
* It is used by the `pulsar-client` to reset the subscription to a specific
timestamp.
* And also used by the `expiry-monitor` to find the messages that are
expired.
Even though the current implementation is correct, and using binary search
to speed-up, but it's still not efficient *enough*.
The current implementation is to scan all the ledgers to find the message
by timestamp.
This is a performance bottleneck, especially for large topics with many
messages.
Say, if there is a topic which has 1m entries, through the binary search,
it will take 20 iterations to find the message.
In some extreme cases, it may lead to a timeout, and the client will not be
able to seeking by timestamp.

PIP: https://github.com/apache/pulsar/pull/22234

Your feedback is very important to us, please take a moment to review the
proposal and provide your thoughts.

Thanks,
Tao Jiuming

Reply via email to