I think we figured this out. It looks like the consumption of partitions is wildly unpredictable. We see a single partition being consumed almost halfway before switching to another partition for consumption. This causes us to read messages from a range of dates out of order.
Interesting at least. Thanks for your help. ----------------------------------------------------------------------------- It may be hard to reason about ordering across 1400 partitions. Could you use the SimpleConsumerShell to consume messages from 1 partition and see if messages are ordered? Thanks, Jun On Fri, Mar 21, 2014 at 1:04 AM, Tom Amon <ta46...@gmail.com> wrote: > Hi All, > > I have a question regarding ordering of consumed messages. We > timestamp our messages and send them into Kafka in order. I wrote a > simple consumer that simply consumes the messages and prints out the > timestamp. I see messages for all seven days worth of date being consumed at once. > > Our setup: > Kafka 0.8 > 5 Kafka brokers > 1400 partitions > > The consumer has 10 threads, simply connects, consumes and prints > timestamps. It is set to the "smallest" offset so that it reads from > the beginning. There are many millions of messages so I think I can > rule out some partitions not having messages for certain days as the > cause. I know that Kafka doesn't guarantee ordering across partitions > but I would assume that with this volume of messages I would see the > timestamps for the first day, followed by the second day, etc. Instead I see them all print at once. > > Any ideas what I might be doing wrong? >