Hi team, Can we use a space-efficient probabilistic data structure, such as bloom filter to store the acked msg ids, if re-delivering acked msgs are not so strict?
On Tue, Sep 24, 2024 at 5:28 PM Andrey Yegorov <andrey.yego...@datastax.com.invalid> wrote: > > Penghui, > > Thank you for the thorough and detailed response. > > I have added the feature toggle per Lari's comment on the implementation PR. > > Regarding the compatibility, if the user has 10MB cursor data somehow (in > our testng large serialized position info resulted in a single large entry > > 1MB that was rejected by BK), then upgraded and rolled back, the data > will be read the same way as it was previously because the footer with the > chunking info will not be present. This is described in the read path. > In case if the user upgrades, enables the feature and creates the cursor > with chunked PositionInfo the older version won't be able to read the data > after rollback. This is why the feature toggle is added. > > I agree that the vast majority of users won't have to deal > with managedLedgerMaxUnackedRangesToPersist in the range of 10th of > millions and above, but there are edge cases when this is needed. > > > On Tue, Sep 24, 2024 at 3:17 PM PengHui Li <peng...@apache.org> wrote: > > > Thanks for driving the proposal. > > > > I would like to share the related context that happened many years ago > > > > - https://lists.apache.org/thread/y0r9kk0968ydpxtf16x6ql3x6kwy7dc1 > > - https://lists.apache.org/thread/hfv18cg0yckt5cqd0fc66rp7tth036kf > > > > We have two major approaches: > > > > 1. Minimize the persistent size of cursor data: > > • Example: PR:9292 and cursor data compression, possibly with a compressed > > bitset implementation (RoaringBitmap). > > > > 2. Split the ack cursor data into multiple chunks: > > • Example: PIP-81, PIP-381. > > > > LinLin and I previously worked on PIP-81. Personally, I am not a big fan of > > this solution. > > While working on PIP-81 and cursor data compression, we found that > > compression works well in most cases, > > even when there are millions or tens of millions of ack ranges. I recall we > > shared data on this before, though I can’t seem to find it now. > > > > From a user perspective, most users are satisfied with the current > > solution, and only a few need compression enabled. > > The simplicity of the solution is vital for community users, which was the > > main reason we gave up on PIP-81 earlier. > > Pulsar is already complex, so having a pluggable solution for the long term > > would be more beneficial. > > This way, most users get a clear, simple version, while others needing > > enhanced solutions can create their plugins, managing the complexity > > themselves. > > > > I’m not going to block this proposal, but a few points need clarification: > > > > • Feature Toggle: Add a flag that allows users to enable this feature > > (keeping it disabled by default until there is higher demand). > > Managed ledger and cursor complexities are well-known, so a smooth opt-in > > process is crucial for users to adopt new features gradually. > > > > • Compatibility Concerns: Since the persistent data structure will change, > > we need to address rollback scenarios. > > For instance, if a user has 10MB of cursor data, upgrades to a new version > > with the PIP changes, and then needs to roll back to the older version, > > will that user lose their 10MB cursor data? What steps are required for a > > rollback to ensure data consistency? > > > > Regards, > > Penghui > > > > On Tue, Sep 24, 2024 at 1:42 AM Lari Hotari <lhot...@apache.org> wrote: > > > > > On Tue, 24 Sept 2024 at 05:01, Rajan Dhabalia <rdhaba...@apache.org> > > > wrote: > > > > However, there are multiple other PRs related to key-shared sub, stats, > > > > cursor performance, and other PRs are still blocked by others and > > people > > > > just block it because they think they don't have this usecase. It's so > > > > unfortunate that people easily merge implementations which only handle > > > > small-scale usecases but the usecases for which Pulsar was built are > > > > being blocked or take a long time to merge. It's just that I don't have > > > > that energy to keep following up for useful and important changes for > > > > Pulsar. And this is one of these examples as well. I have also started > > > > discussion about improving the PIP process because it has become > > painful > > > in > > > > many cases. > > > > > > It's not that individuals want to block changes for no reason. It > > > seems that the main reason for blocking changes is the fear of > > > regressions. Some areas of the Pulsar codebase aren't well covered in > > > our test suites. For example, we don't have performance tests as part > > > of the Apache Pulsar repositories. We have a lot of tests, but most of > > > them are written in a way that tests the code as the author expects it > > > to work. There are very few tests that evaluate features from the > > > end-user API perspective or as system tests. > > > > > > Writing new tests is slow, and the developer experience is poor with > > > the current test infrastructure. Adding more tests to the main build > > > would slow down Pulsar CI even more. This isn't a new problem; it's > > > been around for many years. I'd love to see more proposals and active > > > contributions to improve the "safety nets" of Apache Pulsar so that we > > > wouldn't fear change. I'm not saying that this is only a testing > > > problem. Testability impacts architecture too. Balancing all different > > > aspects of the system isn't easy, and it requires effort and > > > dedication. We don't currently have enough contributors who are > > > investing their time in enabling others to contribute effectively. I > > > hope that we can improve together and address the problems we have > > > that cause the fear of change. When that is addressed, there would be > > > more confidence in accepting new PIPs and changes even when the > > > reviewer doesn't have the use case or when they aren't familiar with > > > the problem that the PIP is targeting to solve. > > > > > > -Lari > > > > > > > > -- > Andrey Yegorov