+1 Haiting
On Tue, Jan 17, 2023 at 5:50 PM Yubiao Feng <yubiao.f...@streamnative.io.invalid> wrote: > > Hi Asaf > > > It's worth noting that the estimated backlog size of a subscription is > estimated since it doesn't consider any acknowledged messages between the > mark delete position and the last message. It simply assumes all messages > between the mark delete position and the last message have not been > acknowledged. > > Yes, it's not the exact value of the backlog. There are two reasons for the > loss of accuracy: > - Whether the Entry size is closer to the `averageSize`. > - The number of messages after the mark deleted position has been > acknowledged. > > Thanks > Yubiao Feng > > On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <asaf.mes...@gmail.com> wrote: > > > Small question regarding this: > > > > The code for calculation is: > > > > long estimateBacklogFromPosition(PositionImpl pos) { > > synchronized (this) { > > long sizeBeforePosLedger = > > ledgers.headMap(pos.getLedgerId()).values() > > .stream().mapToLong(LedgerInfo::getSize).sum(); > > LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId()); > > long sizeAfter = getTotalSize() - sizeBeforePosLedger; > > if (ledgerInfo == null) { > > return sizeAfter; > > } else if (pos.getLedgerId() == currentLedger.getId()) { > > return sizeAfter - consumedLedgerSize(currentLedgerSize, > > currentLedgerEntries, pos.getEntryId()); > > } else { > > return sizeAfter - > > consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(), > > pos.getEntryId()); > > } > > } > > } > > > > and > > > > private long consumedLedgerSize(long ledgerSize, long ledgerEntries, > > long consumedEntries) { > > if (ledgerEntries <= 0) { > > return 0; > > } > > if (ledgerEntries <= (consumedEntries + 1)) { > > return ledgerSize; > > } else { > > long averageSize = ledgerSize / ledgerEntries; > > return consumedEntries >= 0 ? (consumedEntries + 1) * averageSize > > : 0; > > } > > } > > > > > > > > It's worth noting that the estimated backlog size of a subscription is > > estimated since it doesn't consider any acknowledged messages between the > > mark delete position and the last message. It simply assumes all messages > > between the mark delete position and the last message have not been > > acknowledged. > > > > Good idea - +1 > > > > On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <codelipeng...@gmail.com> > > wrote: > > > > > +1 > > > > > > Penghui > > > > > > > On Jan 16, 2023, at 23:36, Yubiao Feng <yubiao.f...@streamnative.io > > .INVALID> > > > wrote: > > > > > > > > Hi community > > > > > > > > I am starting a DISCUSS for making the default value of the parameter > > > > "--get-subscription-backlog-size" of admin API "topics stats" true. > > > > > > > > In the PR https://github.com/apache/pulsar/pull/9302, the property > > > backlog > > > > size of each subscription returned in the response of the API topics > > > stats, > > > > by default this property is always equal to 0 in response, and this > > will > > > > confuse users. Since the calculation of backlog size is done in broker > > > > memory, there is no significant overhead(the process is described in > > the > > > > following section), so I think the correct values should be displayed > > by > > > > default. > > > > > > > > ### The following two APIs should be affected: > > > > > > > > In Pulsar admin API > > > > ``` > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 > > > > --get-subscription-backlog-size > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs > > > > ``` > > > > the default value of parameter `--get-subscription-backlog-size` will > > be > > > > `true` > > > > > > > > In Pulsar Rest API > > > > ``` > > > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats > > > > "?subscriptionBacklogSize=true > > > > ``` > > > > the default value of parameter `subscriptionBacklogSize ` will be > > `true` > > > > > > > > > > > > ### The following is the process of calculating backlog size: > > > > - Divide `PersistentTopc.ledgers` into two parts according to the > > > ledgerId > > > > of the mark delete position of the cursor. The second part is ledgers > > > > indicating the messages still need to be consumed, aka > > > backlogSizeInLedgers. > > > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId of the > > > > mark delete position of the cursor, and we can also divide the ledger > > > into > > > > two parts, the second part is entries indicating the messages still > > need > > > to > > > > be consumed, multiply the average size of each entry in metrics by the > > > > number of still need to be consumed entries we can get the backlog size > > > in > > > > this ledger. aka backlogSizeInEntries. > > > > - `backlogSizeInLe dgers` + `backlogSizeInEntries` > > > > > > > > Thanks > > > > Yubiao Feng > > > > > > > >