Hi Michael Marshall

> My primary concern with making this the new default is that the cost of
calculating the size is in the synchronization on the managed ledger, which
interrupts writes and reads through that managed ledger and potentially
others in some cases.

Yes, it affects producing, but it doesn't affect consuming. When
calculating the backlogSize of each subscription, it fights for an
exclusive lock(see:
https://github.com/apache/pulsar/blob/644be5fdd6d080accffd72229b2e21e35a27d722/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1270),
this lock is also used when sending messages. If a topic has N
subscriptions, then the time of impact is multiplied by N. I will write a
PR to reduce the amplification of influence on producing caused by multiple
subscriptions.

> we should consider making the JSON result easier to interpret.

Yes, I have set the attribute `backlogSize` in response to be `-1` if
`--get-subscription-backlog-size` is false to distinguish between the two
cases:
- There is no backlog, and the parameter `--get-subscription-backlog-size`
is true. After changes, the attribute `backlogSize` in response will be `0`.
- The parameter `--get-subscription-backlog-size` is false. After changes,
the attribute `backlogSize` in response will be `-1`.

On Sat, Jan 21, 2023 at 3:56 AM Michael Marshall <mmarsh...@apache.org>
wrote:

> My primary concern with making this the new default is that the cost
> of calculating the size is in the synchronization on the managed
> ledger, which interrupts writes and reads through that managed ledger
> and potentially others in some cases.
>
> If the primary motivation for this PR is to avoid confusing users (and
> I agree that the 0 value confuses users), we should consider making
> the JSON result easier to interpret. It'd be just as easy to only
> include the subscription backlog size when a client requests it.
>
> What do you think about changing the result?
>
> Thanks,
> Michael
>
> On Thu, Jan 19, 2023 at 4:03 AM Haiting Jiang <jianghait...@gmail.com>
> wrote:
> >
> > +1
> >
> > Haiting
> >
> > On Tue, Jan 17, 2023 at 5:50 PM Yubiao Feng
> > <yubiao.f...@streamnative.io.invalid> wrote:
> > >
> > > Hi Asaf
> > >
> > > > It's worth noting that the estimated backlog size of a subscription
> is
> > > estimated since it doesn't consider any acknowledged messages between
> the
> > > mark delete position and the last message. It simply assumes all
> messages
> > > between the mark delete position and the last message have not been
> > > acknowledged.
> > >
> > > Yes, it's not the exact value of the backlog. There are two reasons
> for the
> > > loss of accuracy:
> > > - Whether the Entry size is closer to the `averageSize`.
> > > - The number of messages after the mark deleted position has been
> > > acknowledged.
> > >
> > > Thanks
> > > Yubiao Feng
> > >
> > > On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <asaf.mes...@gmail.com>
> wrote:
> > >
> > > > Small question regarding this:
> > > >
> > > > The code for calculation is:
> > > >
> > > > long estimateBacklogFromPosition(PositionImpl pos) {
> > > >     synchronized (this) {
> > > >         long sizeBeforePosLedger =
> > > > ledgers.headMap(pos.getLedgerId()).values()
> > > >                 .stream().mapToLong(LedgerInfo::getSize).sum();
> > > >         LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
> > > >         long sizeAfter = getTotalSize() - sizeBeforePosLedger;
> > > >         if (ledgerInfo == null) {
> > > >             return sizeAfter;
> > > >         } else if (pos.getLedgerId() == currentLedger.getId()) {
> > > >             return sizeAfter - consumedLedgerSize(currentLedgerSize,
> > > > currentLedgerEntries, pos.getEntryId());
> > > >         } else {
> > > >             return sizeAfter -
> > > > consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
> > > > pos.getEntryId());
> > > >         }
> > > >     }
> > > > }
> > > >
> > > > and
> > > >
> > > > private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
> > > > long consumedEntries) {
> > > >     if (ledgerEntries <= 0) {
> > > >         return 0;
> > > >     }
> > > >     if (ledgerEntries <= (consumedEntries + 1)) {
> > > >         return ledgerSize;
> > > >     } else {
> > > >         long averageSize = ledgerSize / ledgerEntries;
> > > >         return consumedEntries >= 0 ? (consumedEntries + 1) *
> averageSize
> > > > : 0;
> > > >     }
> > > > }
> > > >
> > > >
> > > >
> > > > It's worth noting that the estimated backlog size of a subscription
> is
> > > > estimated since it doesn't consider any acknowledged messages
> between the
> > > > mark delete position and the last message. It simply assumes all
> messages
> > > > between the mark delete position and the last message have not been
> > > > acknowledged.
> > > >
> > > > Good idea - +1
> > > >
> > > > On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <codelipeng...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Penghui
> > > > >
> > > > > > On Jan 16, 2023, at 23:36, Yubiao Feng <
> yubiao.f...@streamnative.io
> > > > .INVALID>
> > > > > wrote:
> > > > > >
> > > > > > Hi community
> > > > > >
> > > > > > I am starting a DISCUSS for making the default value of the
> parameter
> > > > > > "--get-subscription-backlog-size" of admin API "topics stats"
> true.
> > > > > >
> > > > > > In the PR https://github.com/apache/pulsar/pull/9302, the
> property
> > > > > backlog
> > > > > > size of each subscription returned in the response of the API
> topics
> > > > > stats,
> > > > > > by default this property is always equal to 0 in response, and
> this
> > > > will
> > > > > > confuse users. Since the calculation of backlog size is done in
> broker
> > > > > > memory, there is no significant overhead(the process is
> described in
> > > > the
> > > > > > following section), so I think the correct values should be
> displayed
> > > > by
> > > > > > default.
> > > > > >
> > > > > > ### The following two APIs should be affected:
> > > > > >
> > > > > > In Pulsar admin API
> > > > > > ```
> > > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > > > > > --get-subscription-backlog-size
> > > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > > > > > ```
> > > > > > the default value of parameter `--get-subscription-backlog-size`
> will
> > > > be
> > > > > > `true`
> > > > > >
> > > > > > In Pulsar Rest API
> > > > > > ```
> > > > > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > > > > > "?subscriptionBacklogSize=true
> > > > > > ```
> > > > > > the default value of parameter `subscriptionBacklogSize ` will be
> > > > `true`
> > > > > >
> > > > > >
> > > > > > ### The following is the process of calculating backlog size:
> > > > > > - Divide `PersistentTopc.ledgers` into two parts according to the
> > > > > ledgerId
> > > > > > of the mark delete position of the cursor. The second part is
> ledgers
> > > > > > indicating the messages still need to be consumed, aka
> > > > > backlogSizeInLedgers.
> > > > > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId
> of the
> > > > > > mark delete position of the cursor, and we can also divide the
> ledger
> > > > > into
> > > > > > two parts, the second part is entries indicating the messages
> still
> > > > need
> > > > > to
> > > > > > be consumed, multiply the average size of each entry in metrics
> by the
> > > > > > number of still need to be consumed entries we can get the
> backlog size
> > > > > in
> > > > > > this ledger. aka backlogSizeInEntries.
> > > > > > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> > > > > >
> > > > > > Thanks
> > > > > > Yubiao Feng
> > > > >
> > > > >
> > > >
>

Reply via email to