Zahira, Kafka consumers use a pull model. I'm not sure what backpressure means in this context. If a consumer isn't ready for more records, it just doesn't poll() for more.
The documentation talks about "flow control" but doesn't mention "backpressure". I think these are related but different concepts. Pause/resume lets you prioritize some topics/partitions over others ("flow control"), but that isn't a signal to a sender to stop sending ("backpressure"). Ryanne On Wed, Oct 17, 2018 at 1:55 PM Zahari Dichev <zaharidic...@gmail.com> wrote: > Hi there Ryanne, > > Thanks for the response ! There is most likely quite a lot that I am > missing here, but after I read the docs, it seems to me that the > pause/resume API has been provided with the very purpose of implementing > bespoke flow control. That being said, I see it as quite natural to be able > to pause and resume as needed without facing the problems outlined in > my previous email. So if we are going totally wrong about this and using > the pause/resume the wrong way, feel free to elaborate. I am really not > trying to argue my case here, just genuinely attempting to understand what > can be done on our end to improve the Akka streams integration.. Thanks in > advance :) > > Zahari > > On Wed, Oct 17, 2018 at 5:49 PM Ryanne Dolan <ryannedo...@gmail.com> > wrote: > > > Zahari, > > > > It sounds to me like this problem is due to Akka attempting to implement > > additional backpressure on top of the Consumer API. I'd suggest they not > do > > that, and then this problem goes away. > > > > Ryanne > > > > On Wed, Oct 17, 2018 at 7:35 AM Zahari Dichev <zaharidic...@gmail.com> > > wrote: > > > > > Hi there, > > > > > > Are there any opinions on the matter described in my previous email? I > > > think this is quite important when it comes to implementing any non > > trivial > > > functionality that relies on pause/resume. Of course if I am mistaken, > > feel > > > free to elaborate. > > > > > > Thanks, > > > Zahari > > > > > > On Tue, Oct 16, 2018 at 10:29 AM Zahari Dichev <zaharidic...@gmail.com > > > > > wrote: > > > > > > > Hi there Kafka developers, > > > > > > > > I am currently trying to find a solution to an issue that has been > > > > manifesting itself in the Akka streams implementation of the Kafka > > > > connector. When it comes to consuming messages, the implementation > > relies > > > > heavily on the fact that we can pause and resume partitions. In some > > > > situations when a single consumer instance is shared among several > > > streams, > > > > we might end up with frequently pausing and unpausing a set of topic > > > > partitions, which is the main facility that allows us to implement > back > > > > pressure. This however has certain disadvantages, especially when > there > > > are > > > > two consumers that differ in terms of processing speed. > > > > > > > > To articulate the issue more clearly, imagine that a consumer > maintains > > > > assignments for two topic partitions *TP1* and *TP2*. This consumer > is > > > > shared by two streams - S1 and S2. So effectively when we have demand > > > from > > > > only one of the streams - *S1*, we will pause one of the topic > > partitions > > > > *TP2* and call *poll()* on the consumer to only retrieve the records > > for > > > > the demanded topic partition - *TP1*. The result of that is all the > > > > records that have been prefetched for *TP2* are now thrown away by > the > > > > fetcher ("*Not returning fetched records for assigned partition TP2 > > since > > > > it is no longer fetchable"*). If we extrapolate that to multiple > > streams > > > > sharing the same consumer, we might quickly end up in a situation > where > > > we > > > > throw prefetched data quite often. This does not seem like the most > > > > efficient approach and in fact produces quite a lot of overlapping > > fetch > > > > requests as illustrated in the following issue: > > > > > > > > https://github.com/akka/alpakka-kafka/issues/549 > > > > > > > > I am writing this email to get some initial opinion on a KIP I was > > > > thinking about. What if we give the clients of the Consumer API a bit > > > more > > > > control of what to do with this prefetched data. Two options I am > > > wondering > > > > about: > > > > > > > > 1. Introduce a configuration setting, such as* > > > > "return-prefetched-data-for-paused-topic-partitions = false"* (have > to > > > > think of a better name), which when set to true will return what is > > > > prefetched instead of throwing it away on calling *poll()*. Since > this > > is > > > > amount of data that is bounded by the maximum size of the prefetch, > we > > > can > > > > control what is the most amount of records returned. The client of > the > > > > consumer API can then be responsible for keeping that data around and > > use > > > > it when appropriate (i.e. when demand is present) > > > > > > > > 2. Introduce a facility to pass in a buffer into which the prefetched > > > > records are drained when poll is called and paused partitions have > some > > > > prefetched records. > > > > > > > > Any opinions on the matter are welcome. Thanks a lot ! > > > > > > > > Zahari Dichev > > > > > > > > > >