Re: [DISCUSS] PIP-381: Handle large PositionInfo state

Rajan Dhabalia Wed, 25 Sep 2024 15:21:28 -0700

+1 binding.

>> Can we use a space-efficient probabilistic data structure, such as bloom
filter to store the acked msg ids
Here, in  the problem statement it tries to dump and retrieve data for
topic recovery, so we need a compute efficient solution for faster recovery
with disk storage to store large amounts of data. So, I don't think bloom
filters could fit into this usecase.


>> 1. Minimize the persistent size of cursor data:
>> • Example: PR:9292 and cursor data compression, possibly with a
compressed
I am going to rebase PR:9292 to allow a relatively large number of ranges
for most of the usecases.

Thanks,
Rajan


On Wed, Sep 25, 2024 at 3:00 PM Heesung Sohn <[email protected]> wrote:

> Hi team,
> Can we use a space-efficient probabilistic data structure, such as
> bloom filter to store the acked msg ids, if re-delivering acked msgs
> are not so strict?
>
>
> On Tue, Sep 24, 2024 at 5:28 PM Andrey Yegorov
> <[email protected]> wrote:
> >
> > Penghui,
> >
> > Thank you for the thorough and detailed response.
> >
> > I have added the feature toggle per Lari's comment on the implementation
> PR.
> >
> > Regarding the compatibility, if the user has 10MB cursor data somehow (in
> > our testng large serialized position info resulted in a single large
> entry
> > > 1MB that was rejected by BK), then upgraded and rolled back, the data
> > will be read the same way as it was previously because the footer with
> the
> > chunking info will not be present. This is described in the read path.
> > In case if the user upgrades, enables the feature and creates the cursor
> > with chunked PositionInfo the older version won't be able to read the
> data
> > after rollback. This is why the feature toggle is added.
> >
> > I agree that the vast majority of users won't have to deal
> > with managedLedgerMaxUnackedRangesToPersist in the range of 10th of
> > millions and above, but there are edge cases when this is needed.
> >
> >
> > On Tue, Sep 24, 2024 at 3:17 PM PengHui Li <[email protected]> wrote:
> >
> > > Thanks for driving the proposal.
> > >
> > > I would like to share the related context that happened many years ago
> > >
> > > - https://lists.apache.org/thread/y0r9kk0968ydpxtf16x6ql3x6kwy7dc1
> > > - https://lists.apache.org/thread/hfv18cg0yckt5cqd0fc66rp7tth036kf
> > >
> > > We have two major approaches:
> > >
> > > 1. Minimize the persistent size of cursor data:
> > > • Example: PR:9292 and cursor data compression, possibly with a
> compressed
> > > bitset implementation (RoaringBitmap).
> > >
> > > 2. Split the ack cursor data into multiple chunks:
> > > • Example: PIP-81, PIP-381.
> > >
> > > LinLin and I previously worked on PIP-81. Personally, I am not a big
> fan of
> > > this solution.
> > > While working on PIP-81 and cursor data compression, we found that
> > > compression works well in most cases,
> > > even when there are millions or tens of millions of ack ranges. I
> recall we
> > > shared data on this before, though I can’t seem to find it now.
> > >
> > > From a user perspective, most users are satisfied with the current
> > > solution, and only a few need compression enabled.
> > > The simplicity of the solution is vital for community users, which was
> the
> > > main reason we gave up on PIP-81 earlier.
> > > Pulsar is already complex, so having a pluggable solution for the long
> term
> > > would be more beneficial.
> > > This way, most users get a clear, simple version, while others needing
> > > enhanced solutions can create their plugins, managing the complexity
> > > themselves.
> > >
> > > I’m not going to block this proposal, but a few points need
> clarification:
> > >
> > > • Feature Toggle: Add a flag that allows users to enable this feature
> > > (keeping it disabled by default until there is higher demand).
> > > Managed ledger and cursor complexities are well-known, so a smooth
> opt-in
> > > process is crucial for users to adopt new features gradually.
> > >
> > > • Compatibility Concerns: Since the persistent data structure will
> change,
> > > we need to address rollback scenarios.
> > > For instance, if a user has 10MB of cursor data, upgrades to a new
> version
> > > with the PIP changes, and then needs to roll back to the older version,
> > > will that user lose their 10MB cursor data? What steps are required
> for a
> > > rollback to ensure data consistency?
> > >
> > > Regards,
> > > Penghui
> > >
> > > On Tue, Sep 24, 2024 at 1:42 AM Lari Hotari <[email protected]>
> wrote:
> > >
> > > > On Tue, 24 Sept 2024 at 05:01, Rajan Dhabalia <[email protected]>
> > > > wrote:
> > > > > However, there are multiple other PRs related to key-shared sub,
> stats,
> > > > > cursor performance, and other PRs are still blocked by others and
> > > people
> > > > > just block it because they think they don't have this usecase.
> It's so
> > > > > unfortunate that people easily merge implementations which only
> handle
> > > > > small-scale usecases  but the usecases for which Pulsar was built
> are
> > > > > being blocked or take a long time to merge. It's just that I don't
> have
> > > > > that energy to keep following up for useful and important changes
> for
> > > > > Pulsar. And this is one of these examples as well. I have also
> started
> > > > > discussion about improving the PIP process because it has become
> > > painful
> > > > in
> > > > > many cases.
> > > >
> > > > It's not that individuals want to block changes for no reason. It
> > > > seems that the main reason for blocking changes is the fear of
> > > > regressions. Some areas of the Pulsar codebase aren't well covered in
> > > > our test suites. For example, we don't have performance tests as part
> > > > of the Apache Pulsar repositories. We have a lot of tests, but most
> of
> > > > them are written in a way that tests the code as the author expects
> it
> > > > to work. There are very few tests that evaluate features from the
> > > > end-user API perspective or as system tests.
> > > >
> > > > Writing new tests is slow, and the developer experience is poor with
> > > > the current test infrastructure. Adding more tests to the main build
> > > > would slow down Pulsar CI even more. This isn't a new problem; it's
> > > > been around for many years. I'd love to see more proposals and active
> > > > contributions to improve the "safety nets" of Apache Pulsar so that
> we
> > > > wouldn't fear change. I'm not saying that this is only a testing
> > > > problem. Testability impacts architecture too. Balancing all
> different
> > > > aspects of the system isn't easy, and it requires effort and
> > > > dedication. We don't currently have enough contributors who are
> > > > investing their time in enabling others to contribute effectively. I
> > > > hope that we can improve together and address the problems we have
> > > > that cause the fear of change. When that is addressed, there would be
> > > > more confidence in accepting new PIPs and changes even when the
> > > > reviewer doesn't have the use case or when they aren't familiar with
> > > > the problem that the PIP is targeting to solve.
> > > >
> > > > -Lari
> > > >
> > >
> >
> >
> > --
> > Andrey Yegorov
>

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

Reply via email to