+1
Regards Jiwei Guo (Tboy) On Sat, Jan 28, 2023 at 1:28 AM Yubiao Feng <yubiao.f...@streamnative.io.invalid> wrote: > > Hi Michael Marshall > > > My primary concern with making this the new default is that the cost of > calculating the size is in the synchronization on the managed ledger, which > interrupts writes and reads through that managed ledger and potentially > others in some cases. > > Yes, it affects producing, but it doesn't affect consuming. When > calculating the backlogSize of each subscription, it fights for an > exclusive lock(see: > https://github.com/apache/pulsar/blob/644be5fdd6d080accffd72229b2e21e35a27d722/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1270), > this lock is also used when sending messages. If a topic has N > subscriptions, then the time of impact is multiplied by N. I will write a > PR to reduce the amplification of influence on producing caused by multiple > subscriptions. > > > we should consider making the JSON result easier to interpret. > > Yes, I have set the attribute `backlogSize` in response to be `-1` if > `--get-subscription-backlog-size` is false to distinguish between the two > cases: > - There is no backlog, and the parameter `--get-subscription-backlog-size` > is true. After changes, the attribute `backlogSize` in response will be `0`. > - The parameter `--get-subscription-backlog-size` is false. After changes, > the attribute `backlogSize` in response will be `-1`. > > On Sat, Jan 21, 2023 at 3:56 AM Michael Marshall <mmarsh...@apache.org> > wrote: > > > My primary concern with making this the new default is that the cost > > of calculating the size is in the synchronization on the managed > > ledger, which interrupts writes and reads through that managed ledger > > and potentially others in some cases. > > > > If the primary motivation for this PR is to avoid confusing users (and > > I agree that the 0 value confuses users), we should consider making > > the JSON result easier to interpret. It'd be just as easy to only > > include the subscription backlog size when a client requests it. > > > > What do you think about changing the result? > > > > Thanks, > > Michael > > > > On Thu, Jan 19, 2023 at 4:03 AM Haiting Jiang <jianghait...@gmail.com> > > wrote: > > > > > > +1 > > > > > > Haiting > > > > > > On Tue, Jan 17, 2023 at 5:50 PM Yubiao Feng > > > <yubiao.f...@streamnative.io.invalid> wrote: > > > > > > > > Hi Asaf > > > > > > > > > It's worth noting that the estimated backlog size of a subscription > > is > > > > estimated since it doesn't consider any acknowledged messages between > > the > > > > mark delete position and the last message. It simply assumes all > > messages > > > > between the mark delete position and the last message have not been > > > > acknowledged. > > > > > > > > Yes, it's not the exact value of the backlog. There are two reasons > > for the > > > > loss of accuracy: > > > > - Whether the Entry size is closer to the `averageSize`. > > > > - The number of messages after the mark deleted position has been > > > > acknowledged. > > > > > > > > Thanks > > > > Yubiao Feng > > > > > > > > On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <asaf.mes...@gmail.com> > > wrote: > > > > > > > > > Small question regarding this: > > > > > > > > > > The code for calculation is: > > > > > > > > > > long estimateBacklogFromPosition(PositionImpl pos) { > > > > > synchronized (this) { > > > > > long sizeBeforePosLedger = > > > > > ledgers.headMap(pos.getLedgerId()).values() > > > > > .stream().mapToLong(LedgerInfo::getSize).sum(); > > > > > LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId()); > > > > > long sizeAfter = getTotalSize() - sizeBeforePosLedger; > > > > > if (ledgerInfo == null) { > > > > > return sizeAfter; > > > > > } else if (pos.getLedgerId() == currentLedger.getId()) { > > > > > return sizeAfter - consumedLedgerSize(currentLedgerSize, > > > > > currentLedgerEntries, pos.getEntryId()); > > > > > } else { > > > > > return sizeAfter - > > > > > consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(), > > > > > pos.getEntryId()); > > > > > } > > > > > } > > > > > } > > > > > > > > > > and > > > > > > > > > > private long consumedLedgerSize(long ledgerSize, long ledgerEntries, > > > > > long consumedEntries) { > > > > > if (ledgerEntries <= 0) { > > > > > return 0; > > > > > } > > > > > if (ledgerEntries <= (consumedEntries + 1)) { > > > > > return ledgerSize; > > > > > } else { > > > > > long averageSize = ledgerSize / ledgerEntries; > > > > > return consumedEntries >= 0 ? (consumedEntries + 1) * > > averageSize > > > > > : 0; > > > > > } > > > > > } > > > > > > > > > > > > > > > > > > > > It's worth noting that the estimated backlog size of a subscription > > is > > > > > estimated since it doesn't consider any acknowledged messages > > between the > > > > > mark delete position and the last message. It simply assumes all > > messages > > > > > between the mark delete position and the last message have not been > > > > > acknowledged. > > > > > > > > > > Good idea - +1 > > > > > > > > > > On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <codelipeng...@gmail.com> > > > > > wrote: > > > > > > > > > > > +1 > > > > > > > > > > > > Penghui > > > > > > > > > > > > > On Jan 16, 2023, at 23:36, Yubiao Feng < > > yubiao.f...@streamnative.io > > > > > .INVALID> > > > > > > wrote: > > > > > > > > > > > > > > Hi community > > > > > > > > > > > > > > I am starting a DISCUSS for making the default value of the > > parameter > > > > > > > "--get-subscription-backlog-size" of admin API "topics stats" > > true. > > > > > > > > > > > > > > In the PR https://github.com/apache/pulsar/pull/9302, the > > property > > > > > > backlog > > > > > > > size of each subscription returned in the response of the API > > topics > > > > > > stats, > > > > > > > by default this property is always equal to 0 in response, and > > this > > > > > will > > > > > > > confuse users. Since the calculation of backlog size is done in > > broker > > > > > > > memory, there is no significant overhead(the process is > > described in > > > > > the > > > > > > > following section), so I think the correct values should be > > displayed > > > > > by > > > > > > > default. > > > > > > > > > > > > > > ### The following two APIs should be affected: > > > > > > > > > > > > > > In Pulsar admin API > > > > > > > ``` > > > > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 > > > > > > > --get-subscription-backlog-size > > > > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs > > > > > > > ``` > > > > > > > the default value of parameter `--get-subscription-backlog-size` > > will > > > > > be > > > > > > > `true` > > > > > > > > > > > > > > In Pulsar Rest API > > > > > > > ``` > > > > > > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats > > > > > > > "?subscriptionBacklogSize=true > > > > > > > ``` > > > > > > > the default value of parameter `subscriptionBacklogSize ` will be > > > > > `true` > > > > > > > > > > > > > > > > > > > > > ### The following is the process of calculating backlog size: > > > > > > > - Divide `PersistentTopc.ledgers` into two parts according to the > > > > > > ledgerId > > > > > > > of the mark delete position of the cursor. The second part is > > ledgers > > > > > > > indicating the messages still need to be consumed, aka > > > > > > backlogSizeInLedgers. > > > > > > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId > > of the > > > > > > > mark delete position of the cursor, and we can also divide the > > ledger > > > > > > into > > > > > > > two parts, the second part is entries indicating the messages > > still > > > > > need > > > > > > to > > > > > > > be consumed, multiply the average size of each entry in metrics > > by the > > > > > > > number of still need to be consumed entries we can get the > > backlog size > > > > > > in > > > > > > > this ledger. aka backlogSizeInEntries. > > > > > > > - `backlogSizeInLe dgers` + `backlogSizeInEntries` > > > > > > > > > > > > > > Thanks > > > > > > > Yubiao Feng > > > > > > > > > > > > > > > > > > >