Hi Pulsar community, A new PIP is opened, this thread is to discuss PIP-345: Optimize finding message by timestamp.
Motivation: Finding message by timestamp is widely used in Pulsar: * It is used by the `pulsar-admin` tool to get the message id by timestamp, expire messages by timestamp, and reset cursor. * It is used by the `pulsar-client` to reset the subscription to a specific timestamp. * And also used by the `expiry-monitor` to find the messages that are expired. Even though the current implementation is correct, and using binary search to speed-up, but it's still not efficient *enough*. The current implementation is to scan all the ledgers to find the message by timestamp. This is a performance bottleneck, especially for large topics with many messages. Say, if there is a topic which has 1m entries, through the binary search, it will take 20 iterations to find the message. In some extreme cases, it may lead to a timeout, and the client will not be able to seeking by timestamp. PIP: https://github.com/apache/pulsar/pull/22234 Your feedback is very important to us, please take a moment to review the proposal and provide your thoughts. Thanks, Tao Jiuming