You definitely *might* see data from multiple partitions, and that won't be
uncommon once you start processing data. However, there is no guarantee.

In practice, it may be unlikely to see data for both partitions on the
first call to poll() for a simple reason: poll() will return as soon as any
data for any partition is available. Unless things are timed just right,
you're probably making requests to different brokers for data in the
different partitions. These requests won't be perfectly aligned -- one of
them will get a response first and the poll() will be able to return with
some data. Since only the one response will have been received, only one
partition will get data.

After the first poll, you probably spend some time processing that data
before you call poll again. However, another request has been sent out to
the broker that returned data faster and the other request also gets
returned. So on the next poll, you might be more likely to see data from
both partitions.

So you're right: there's no hard guarantee, and you shouldn't write your
consumer code to assume that data will be returned for all partitions. (And
you can't assume that anyway; what if no new data had been published to one
of the partitions?). However, many times you will see data from multiple
partitions.

-Ewen

On Thu, Mar 10, 2016 at 11:21 AM, Shrijeet Paliwal <
shrijeet.pali...@gmail.com> wrote:

> Version: 0.9.0.1
>
> I have a test which creates two partitions in a topic, writes data to both
> partitions. Then a single consumer subscribes to the topic, verifies that
> it has got the assignment of both partitions in that topic & finally issues
> a poll. The firs poll always comes back with records of only one partition.
> I need to poll one more time to get records for the second partition. The
> poll timeout has no effect on this.
>
> Unless I've misunderstood the contract - the first poll *could* have
> returned records for the both the partitions. After-all poll
> returns ConsumerRecords<K,V>, which is a map of topic_partitions -->
> records
>
> I acknowledge that API does not make any hard guarantees that align with my
> expectation but  looks like API was crafted to support multiple partitions
> & topics in single call. Is there an implementation detail which restricts
> this? Is there a configuration which is controlling what gets fetched?
>
> --
> Shrijeet
>



-- 
Thanks,
Ewen

Reply via email to