Hi, Jiuming I'm sorry for not getting back to you sooner.
First, I support the motivation to optimize this case because it could be a significant blocker for users who want infinite data retention, which is a BIG differentiator with Apache Kafka. And, I really saw the cases with high publish throughput, and one ledger could even hold 1M entries, 100M new entries published to a topic. Then, I try to check the details of the existing implementation. I think the tricky part is the publish time is not the concept of the ManageLedger. I saw the changes that you proposed will add publish time to the ManageLedger module, which doesn't look good me. Because it will couple the Pulsar concept with the ManageLedger concept. Essentially, the publish time could be a secondary index of the ManageLedger. My opinion is to have a general ManagedLedgerIndex abstract, and the Pulsar broker can create any index it wants. Since the broker creates the index, the broker can control the index's behavior. Then, the ManageLedger can provide an API to search the entry with a ManagedLedgerIndex. With this option, we don't need to add the publish time concept to ManagedLedger directly. In this case, if the broker tries to search the entry with a predicate and index. The managed ledger will search from the index first. Of course, if the relevant entry cannot be found in the index, just fall back to the "optimized full scan". Regards, Penghui On Mon, Mar 25, 2024 at 11:51 AM 太上玄元道君 <dao...@apache.org> wrote: > bump > > 太上玄元道君 <dao...@apache.org>于2024年3月20日 周三16:23写道: > > > bump > > > > 太上玄元道君 <dao...@apache.org>于2024年3月19日 周二19:35写道: > > > >> Hi Pulsar community, > >> > >> This thread is to start a vote for PIP-345: Optimize finding message by > >> timestamp > >> > >> PIP: https://github.com/apache/pulsar/pull/22234 > >> Discuss thread: > >> https://lists.apache.org/thread/5owc9os6wmy52zxbv07qo2jrfjm17hd2 > >> > >> Thanks, > >> Tao Jiuming > >> > > >