I have observed the issue that Matteo describes and I also attributed the problem to the absence of a back pressure mechanism in the client. Issue #2497 was not about that, though. There was some corruption going on that was leading to the server receiving garbage.
-Flavio > On 8 Jan 2021, at 22:47, Matteo Merli <[email protected]> wrote: > > On Fri, Jan 8, 2021 at 8:27 AM Enrico Olivelli <[email protected]> wrote: >> >> Hi Matteo, >> in this comment you are talking about an issue you saw when WQ is greater >> that AQ >> https://github.com/apache/bookkeeper/issues/2497#issuecomment-734423246 >> >> IIUC you are saying that if one bookie is slow the client continues to >> accumulate references to the entries that still have not received the >> confirmation from it. >> I think that this is correct. >> >> Have you seen problems in production related to this scenario ? >> Can you tell more about them ? > > Yes, for simplicity, assume e=3, w=3, a=2. > > If one bookie is slow (not down, just slow), the BK client will the > acks to the user that the entries are written after the first 2 acks. > In the meantime, it will keep waiting for the 3rd bookie to respond. > If the bookie responds within the timeout, the entries can now be > dropped from memory, otherwise the write will timeout internally and > it will get replayed to a new bookie. > > In both cases, the amount of memory used in the client will max at > "throughput" * "timeout". This can be a large amount of memory and > easily cause OOM errors. > > Part of the problem is that it cannot be solved from outside the BK > client, since there's no visibility on what entries have 2 or 3 acks > and therefore it's not possible to apply backpressure. Instead, > there should be a backpressure mechanism in the BK client itself to > prevent this kind of issue. > One possibility there could be to use the same approach as described > in > https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits, > giving a max memory limit per BK client instance and throttling > everything after the quota is reached. > > > Matteo
