Re: [DISCUSS] PIP-381: Handle large PositionInfo state

Heesung Sohn Wed, 25 Sep 2024 15:14:40 -0700

Sorry, we probably need to store "not acked" msg ids in the bloom filter.


For all msg_Id from the min_acked,
- if found in "not-acked msg id set"(possibly non-acked) ->
re-send(msgs possibly duplicated).
- if not found in "not-acked msg id set"(definitely acked) -> do not send.


On Wed, Sep 25, 2024 at 2:59 PM Heesung Sohn <hees...@apache.org> wrote:
>
> Hi team,
> Can we use a space-efficient probabilistic data structure, such as
> bloom filter to store the acked msg ids, if re-delivering acked msgs
> are not so strict?
>
>
> On Tue, Sep 24, 2024 at 5:28 PM Andrey Yegorov
> <andrey.yego...@datastax.com.invalid> wrote:
> >
> > Penghui,
> >
> > Thank you for the thorough and detailed response.
> >
> > I have added the feature toggle per Lari's comment on the implementation PR.
> >
> > Regarding the compatibility, if the user has 10MB cursor data somehow (in
> > our testng large serialized position info resulted in a single large entry
> > > 1MB that was rejected by BK), then upgraded and rolled back, the data
> > will be read the same way as it was previously because the footer with the
> > chunking info will not be present. This is described in the read path.
> > In case if the user upgrades, enables the feature and creates the cursor
> > with chunked PositionInfo the older version won't be able to read the data
> > after rollback. This is why the feature toggle is added.
> >
> > I agree that the vast majority of users won't have to deal
> > with managedLedgerMaxUnackedRangesToPersist in the range of 10th of
> > millions and above, but there are edge cases when this is needed.
> >
> >
> > On Tue, Sep 24, 2024 at 3:17 PM PengHui Li <peng...@apache.org> wrote:
> >
> > > Thanks for driving the proposal.
> > >
> > > I would like to share the related context that happened many years ago
> > >
> > > - https://lists.apache.org/thread/y0r9kk0968ydpxtf16x6ql3x6kwy7dc1
> > > - https://lists.apache.org/thread/hfv18cg0yckt5cqd0fc66rp7tth036kf
> > >
> > > We have two major approaches:
> > >
> > > 1. Minimize the persistent size of cursor data:
> > > • Example: PR:9292 and cursor data compression, possibly with a compressed
> > > bitset implementation (RoaringBitmap).
> > >
> > > 2. Split the ack cursor data into multiple chunks:
> > > • Example: PIP-81, PIP-381.
> > >
> > > LinLin and I previously worked on PIP-81. Personally, I am not a big fan 
> > > of
> > > this solution.
> > > While working on PIP-81 and cursor data compression, we found that
> > > compression works well in most cases,
> > > even when there are millions or tens of millions of ack ranges. I recall 
> > > we
> > > shared data on this before, though I can’t seem to find it now.
> > >
> > > From a user perspective, most users are satisfied with the current
> > > solution, and only a few need compression enabled.
> > > The simplicity of the solution is vital for community users, which was the
> > > main reason we gave up on PIP-81 earlier.
> > > Pulsar is already complex, so having a pluggable solution for the long 
> > > term
> > > would be more beneficial.
> > > This way, most users get a clear, simple version, while others needing
> > > enhanced solutions can create their plugins, managing the complexity
> > > themselves.
> > >
> > > I’m not going to block this proposal, but a few points need clarification:
> > >
> > > • Feature Toggle: Add a flag that allows users to enable this feature
> > > (keeping it disabled by default until there is higher demand).
> > > Managed ledger and cursor complexities are well-known, so a smooth opt-in
> > > process is crucial for users to adopt new features gradually.
> > >
> > > • Compatibility Concerns: Since the persistent data structure will change,
> > > we need to address rollback scenarios.
> > > For instance, if a user has 10MB of cursor data, upgrades to a new version
> > > with the PIP changes, and then needs to roll back to the older version,
> > > will that user lose their 10MB cursor data? What steps are required for a
> > > rollback to ensure data consistency?
> > >
> > > Regards,
> > > Penghui
> > >
> > > On Tue, Sep 24, 2024 at 1:42 AM Lari Hotari <lhot...@apache.org> wrote:
> > >
> > > > On Tue, 24 Sept 2024 at 05:01, Rajan Dhabalia <rdhaba...@apache.org>
> > > > wrote:
> > > > > However, there are multiple other PRs related to key-shared sub, 
> > > > > stats,
> > > > > cursor performance, and other PRs are still blocked by others and
> > > people
> > > > > just block it because they think they don't have this usecase. It's so
> > > > > unfortunate that people easily merge implementations which only handle
> > > > > small-scale usecases  but the usecases for which Pulsar was built  are
> > > > > being blocked or take a long time to merge. It's just that I don't 
> > > > > have
> > > > > that energy to keep following up for useful and important changes for
> > > > > Pulsar. And this is one of these examples as well. I have also started
> > > > > discussion about improving the PIP process because it has become
> > > painful
> > > > in
> > > > > many cases.
> > > >
> > > > It's not that individuals want to block changes for no reason. It
> > > > seems that the main reason for blocking changes is the fear of
> > > > regressions. Some areas of the Pulsar codebase aren't well covered in
> > > > our test suites. For example, we don't have performance tests as part
> > > > of the Apache Pulsar repositories. We have a lot of tests, but most of
> > > > them are written in a way that tests the code as the author expects it
> > > > to work. There are very few tests that evaluate features from the
> > > > end-user API perspective or as system tests.
> > > >
> > > > Writing new tests is slow, and the developer experience is poor with
> > > > the current test infrastructure. Adding more tests to the main build
> > > > would slow down Pulsar CI even more. This isn't a new problem; it's
> > > > been around for many years. I'd love to see more proposals and active
> > > > contributions to improve the "safety nets" of Apache Pulsar so that we
> > > > wouldn't fear change. I'm not saying that this is only a testing
> > > > problem. Testability impacts architecture too. Balancing all different
> > > > aspects of the system isn't easy, and it requires effort and
> > > > dedication. We don't currently have enough contributors who are
> > > > investing their time in enabling others to contribute effectively. I
> > > > hope that we can improve together and address the problems we have
> > > > that cause the fear of change. When that is addressed, there would be
> > > > more confidence in accepting new PIPs and changes even when the
> > > > reviewer doesn't have the use case or when they aren't familiar with
> > > > the problem that the PIP is targeting to solve.
> > > >
> > > > -Lari
> > > >
> > >
> >
> >
> > --
> > Andrey Yegorov

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

Reply via email to