On Monday, January 31, 2022, Chad Preisler <chad.preis...@gmail.com> wrote:
> Hello, > > I got this from the JavaDocs for KafkaConsumer. > > * If a consumer is assigned multiple partitions to fetch data from, it > will try to consume from all of them at the same time, > * effectively giving these partitions the same priority for consumption. > However in some cases consumers may want to > * first focus on fetching from some subset of the assigned partitions at > full speed, and only start fetching other partitions > * when these partitions have few or no data to consume. > > * One of such cases is stream processing, where processor fetches from two > topics and performs the join on these two streams. > * When one of the topics is long lagging behind the other, the processor > would like to pause fetching from the ahead topic > * in order to get the lagging stream to catch up. Another example is > bootstraping upon consumer starting up where there are > * a lot of history data to catch up, the applications usually want to get > the latest data on some of the topics before consider > * fetching other topics. > > I'm testing a consumer now. When the topic being read has the following > lag. > > consumer group partition: 0, offset: 254, lag: 12301 > consumer group partition: 1, offset: 302, lag: 12216 > consumer group partition: 2, offset: 300, lag: 12257 > consumer group partition: 3, offset: 259, lag: 12108 > > My consumer is starting with partition 3 and catching all the way up, then > it starts reading the rest of the partitions evenly. I'm not sure why it is > happening that way. > > Hope this helps. > > > > > > On Sun, Jan 23, 2022 at 1:58 AM Mazen Ezzeddine < > mazen.ezzedd...@etu.univ-cotedazur.fr> wrote: > > > Dear all, > > > > Consider a kafka topic deployment with 3 partitions P1, P2, P3 with > > events/records lagging in the partitions equal to 100, 50, 75 for P1, P2, > > P3 respectively. And let’s suppose that num.poll.records (the maximum > > number of records that can be fetched from the broker ) is equal to 100. > > > > If the consumer sends a request to fetch records from P1, P2, P3, is > > there any guarantee that the returned records will be fairly/uniformly > > selected out of the available partitions e.g., say 34 records from P1, 33 > > from P2 and 33 from P3. > > > > Otherwise, how the decision on the returned records is handled (e.g., is > > it based on the first partition leader that replies to the fetch request > > e.g., say P1..). In such case how eventual fairness is guaranteed across > > different partitions, in case for example when records happen to be > > fetched/read from a single partition. > > > > Thank you. > > > > > What I have noticed anecdotally. The order is random. Two consumers reading the same messages from the same group will get messages in different orders. Also if you get backlogged and partitions have depth you tend to get all the data from a partition before it moves onto the next. But this behavior is likely very version and client dependent. The order you consume shouldn't matter but in practice everything matters at least a little evit to someone. -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.