You definitely *might* see data from multiple partitions, and that won't be uncommon once you start processing data. However, there is no guarantee.
In practice, it may be unlikely to see data for both partitions on the first call to poll() for a simple reason: poll() will return as soon as any data for any partition is available. Unless things are timed just right, you're probably making requests to different brokers for data in the different partitions. These requests won't be perfectly aligned -- one of them will get a response first and the poll() will be able to return with some data. Since only the one response will have been received, only one partition will get data. After the first poll, you probably spend some time processing that data before you call poll again. However, another request has been sent out to the broker that returned data faster and the other request also gets returned. So on the next poll, you might be more likely to see data from both partitions. So you're right: there's no hard guarantee, and you shouldn't write your consumer code to assume that data will be returned for all partitions. (And you can't assume that anyway; what if no new data had been published to one of the partitions?). However, many times you will see data from multiple partitions. -Ewen On Thu, Mar 10, 2016 at 11:21 AM, Shrijeet Paliwal < shrijeet.pali...@gmail.com> wrote: > Version: 0.9.0.1 > > I have a test which creates two partitions in a topic, writes data to both > partitions. Then a single consumer subscribes to the topic, verifies that > it has got the assignment of both partitions in that topic & finally issues > a poll. The firs poll always comes back with records of only one partition. > I need to poll one more time to get records for the second partition. The > poll timeout has no effect on this. > > Unless I've misunderstood the contract - the first poll *could* have > returned records for the both the partitions. After-all poll > returns ConsumerRecords<K,V>, which is a map of topic_partitions --> > records > > I acknowledge that API does not make any hard guarantees that align with my > expectation but looks like API was crafted to support multiple partitions > & topics in single call. Is there an implementation detail which restricts > this? Is there a configuration which is controlling what gets fetched? > > -- > Shrijeet > -- Thanks, Ewen