Re: [E] Re: [PIP-78] Split the individual acknowledgments into multiple entries

2021-01-24 Thread Sijie Guo
Rajan,

Thank you for sharing the prototype! That looks great to me.

In order for the community to evolve and experiment with different
approaches, how about we abstract this acknowledgment management as an
interface?

If this approach works for you, maybe Lin Lin can focus on abstracting this
interface. Then we can implement your approach and his approach using this
interface. Users can decide which implementation to use.

Thanks,
Sijie

On Sat, Jan 23, 2021 at 7:46 PM Rajan Dhabalia  wrote:

> *Hi, >> Delayed messages or certain user logic can introduce a lot of
> message-holes. We have seen this issue in quite a lot of customers'
> production environment.I agree that we need a larger buffer to store and
> recover individually deleted messages, and the existing 150K limit might
> not be enough for many usescases. However, I would also like to highlight
> issues that happened in the past due to keeping a large number of deleted
> messages in the broker’s memory and one of them was high GC pauses.
> Therefore, we introduced ConcurrentOpenLongPairRangeSet
>  to manage deleted messages
> without actually storing range objects in memory. OpenRangeSet uses bitset
> to store ranges in memory and we can also utilize it to persist in disk for
> the recovery. This approach has various advantages: simple implementation,
> large enough range for recovery, and it skips intermediate conversion from
> unack-messages to bitset in OpenRangeSet which saves extra CPU while
> recovery.I implemented a simple prototype
> <
> https://github.com/rdhabalia/pulsar/commit/1f8e5e745e9f1d1429697b5dee1da70545385653
> >
> to store deleted messages using bitset in OpenRangeSet and we can persist
> 10M ranges with 5MB data size which I guess is large enough for any
> usecases. So, we can use this approach to solve the problem without
> introducing unnecessary complexity in managed-cursor.Thanks,Rajan*
>
> On Fri, Jan 22, 2021 at 7:52 PM Sijie Guo  wrote:
>
> > Joe - Delayed messages or certain user logic can introduce a lot of
> message
> > holes. We have seen this issue in quite a lot of customers' production
> > environment. Hence we need to find a solution for solving these problems.
> > If you are skeptical of an implementation like that, how about us making
> > cursor implementation pluggable. We can make this proposal implemented as
> > one plugin. So it will not impact any existing logic but allowing people
> > use a plugin to solve this problem.
> >
> > Thanks,
> > Sijie
> >
> > On Fri, Jan 22, 2021 at 5:00 PM Joe Francis
>  > >
> > wrote:
> >
> > > Let me take a step back and explain  how I am looking at this from a
> > > high-level
> > > design viewpoint
> > >
> > >
> > > Bookkeeper (BK) is like an LSM implementation of a KV store. Writes to
> > all
> > > keys are appended to a single file; deletes are logical.  Compaction
> > > reclaims space.  An Index is used locate entries, tracking logical
> > deletes
> > > and reclaim space.
> > >
> > >
> > > The index in BK  is another LSM.  Again, writes are appended, deletes
> are
> > > logical, and  an index is used to  locate entries , account for deletes
> > and
> > > compaction to reclaim space (the implementation within rocksdb is far
> > more
> > > complex with bloom filters and memtables, but you get the idea )   BK
> > just
> > > uses a sophisticated index (rocksdb) which is tiny and cacheable and
> > > rocksdb has within it a sophisticated index which is small and
> cacheable
> > >
> > >
> > > So when I look at this proposal, what I see is the same - another
> attempt
> > > to build an LSM with a sophisticated index/cache mechanism using log
> > > structured storage. So I am quite skeptical that this needs to solved
> > this
> > > way,  within Pulsar.
> > >
> > >
> > >
> > > Joe
> > >
> > > On Wed, Jan 20, 2021 at 12:30 AM linlin  wrote:
> > >
> > > > We can look at ManagedCursorImpl.buildIndividualDeletedMessageRanges
> > > >
> > > > What is saved in the entry is not a bitSet, but a messageRange one by
> > > one,
> > > > which contains information such as ledgerId and entryId. BitSet only
> > > exists
> > > > in the memory and is used to quickly determine whether it already
> > exists.
> > > > In addition, the position of each ack will be stored in the
> > > > individualDeletedMessages queue. When persisted to the entry, the
> queue
> > > > will be traversed, and the position information of each ack will
> > > generate a
> > > > messageRange.
> > > > A messageRange contains lowerEndpoint (ledgerId+entryId),
> upperEndpoint
> > > > (ledgerId+entryId), 4 longs, about 256 bits.
> > > >
> > > > We assume a more extreme scenario, 300K messages, every other ack has
> > an
> > > > unacknowledged, that is, 150K location information will be stored in
> > > > individualDeletedMessages. 150K * 256/8/1024 /1024 ≈ 4.6MB
> > > > Of course, there are also scenarios where the customer's ack spans
> > > several
> > > > ledgers.
> > > >
> > > >
> >

[GitHub] [pulsar-manager] tuteng merged pull request #372: 362 fix retention size label

2021-01-24 Thread GitBox


tuteng merged pull request #372:
URL: https://github.com/apache/pulsar-manager/pull/372


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [pulsar-manager] tuteng closed issue #362: Wrong Label in Topic / Policies / Retention

2021-01-24 Thread GitBox


tuteng closed issue #362:
URL: https://github.com/apache/pulsar-manager/issues/362


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org