Hi team,
Can we use a space-efficient probabilistic data structure, such as
bloom filter to store the acked msg ids, if re-delivering acked msgs
are not so strict?


On Tue, Sep 24, 2024 at 5:28 PM Andrey Yegorov
<andrey.yego...@datastax.com.invalid> wrote:
>
> Penghui,
>
> Thank you for the thorough and detailed response.
>
> I have added the feature toggle per Lari's comment on the implementation PR.
>
> Regarding the compatibility, if the user has 10MB cursor data somehow (in
> our testng large serialized position info resulted in a single large entry
> > 1MB that was rejected by BK), then upgraded and rolled back, the data
> will be read the same way as it was previously because the footer with the
> chunking info will not be present. This is described in the read path.
> In case if the user upgrades, enables the feature and creates the cursor
> with chunked PositionInfo the older version won't be able to read the data
> after rollback. This is why the feature toggle is added.
>
> I agree that the vast majority of users won't have to deal
> with managedLedgerMaxUnackedRangesToPersist in the range of 10th of
> millions and above, but there are edge cases when this is needed.
>
>
> On Tue, Sep 24, 2024 at 3:17 PM PengHui Li <peng...@apache.org> wrote:
>
> > Thanks for driving the proposal.
> >
> > I would like to share the related context that happened many years ago
> >
> > - https://lists.apache.org/thread/y0r9kk0968ydpxtf16x6ql3x6kwy7dc1
> > - https://lists.apache.org/thread/hfv18cg0yckt5cqd0fc66rp7tth036kf
> >
> > We have two major approaches:
> >
> > 1. Minimize the persistent size of cursor data:
> > • Example: PR:9292 and cursor data compression, possibly with a compressed
> > bitset implementation (RoaringBitmap).
> >
> > 2. Split the ack cursor data into multiple chunks:
> > • Example: PIP-81, PIP-381.
> >
> > LinLin and I previously worked on PIP-81. Personally, I am not a big fan of
> > this solution.
> > While working on PIP-81 and cursor data compression, we found that
> > compression works well in most cases,
> > even when there are millions or tens of millions of ack ranges. I recall we
> > shared data on this before, though I can’t seem to find it now.
> >
> > From a user perspective, most users are satisfied with the current
> > solution, and only a few need compression enabled.
> > The simplicity of the solution is vital for community users, which was the
> > main reason we gave up on PIP-81 earlier.
> > Pulsar is already complex, so having a pluggable solution for the long term
> > would be more beneficial.
> > This way, most users get a clear, simple version, while others needing
> > enhanced solutions can create their plugins, managing the complexity
> > themselves.
> >
> > I’m not going to block this proposal, but a few points need clarification:
> >
> > • Feature Toggle: Add a flag that allows users to enable this feature
> > (keeping it disabled by default until there is higher demand).
> > Managed ledger and cursor complexities are well-known, so a smooth opt-in
> > process is crucial for users to adopt new features gradually.
> >
> > • Compatibility Concerns: Since the persistent data structure will change,
> > we need to address rollback scenarios.
> > For instance, if a user has 10MB of cursor data, upgrades to a new version
> > with the PIP changes, and then needs to roll back to the older version,
> > will that user lose their 10MB cursor data? What steps are required for a
> > rollback to ensure data consistency?
> >
> > Regards,
> > Penghui
> >
> > On Tue, Sep 24, 2024 at 1:42 AM Lari Hotari <lhot...@apache.org> wrote:
> >
> > > On Tue, 24 Sept 2024 at 05:01, Rajan Dhabalia <rdhaba...@apache.org>
> > > wrote:
> > > > However, there are multiple other PRs related to key-shared sub, stats,
> > > > cursor performance, and other PRs are still blocked by others and
> > people
> > > > just block it because they think they don't have this usecase. It's so
> > > > unfortunate that people easily merge implementations which only handle
> > > > small-scale usecases  but the usecases for which Pulsar was built  are
> > > > being blocked or take a long time to merge. It's just that I don't have
> > > > that energy to keep following up for useful and important changes for
> > > > Pulsar. And this is one of these examples as well. I have also started
> > > > discussion about improving the PIP process because it has become
> > painful
> > > in
> > > > many cases.
> > >
> > > It's not that individuals want to block changes for no reason. It
> > > seems that the main reason for blocking changes is the fear of
> > > regressions. Some areas of the Pulsar codebase aren't well covered in
> > > our test suites. For example, we don't have performance tests as part
> > > of the Apache Pulsar repositories. We have a lot of tests, but most of
> > > them are written in a way that tests the code as the author expects it
> > > to work. There are very few tests that evaluate features from the
> > > end-user API perspective or as system tests.
> > >
> > > Writing new tests is slow, and the developer experience is poor with
> > > the current test infrastructure. Adding more tests to the main build
> > > would slow down Pulsar CI even more. This isn't a new problem; it's
> > > been around for many years. I'd love to see more proposals and active
> > > contributions to improve the "safety nets" of Apache Pulsar so that we
> > > wouldn't fear change. I'm not saying that this is only a testing
> > > problem. Testability impacts architecture too. Balancing all different
> > > aspects of the system isn't easy, and it requires effort and
> > > dedication. We don't currently have enough contributors who are
> > > investing their time in enabling others to contribute effectively. I
> > > hope that we can improve together and address the problems we have
> > > that cause the fear of change. When that is addressed, there would be
> > > more confidence in accepting new PIPs and changes even when the
> > > reviewer doesn't have the use case or when they aren't familiar with
> > > the problem that the PIP is targeting to solve.
> > >
> > > -Lari
> > >
> >
>
>
> --
> Andrey Yegorov

Reply via email to