Hi Jay, Jun, Thanks for your comments - you have confirmed what I thought was most likely the case. I will attempt to work around the issue for the moment in the client to minimise the chance of the out-of-order problem occurring (probably by stopping retries and triggering a fail-fast of the JVM so that by the time it restarts there is little chance of pending requests on the prior connection).
I look forward to seeing a design proposal. Thanks, Ross On 24 August 2013 01:34, Jay Kreps <jay.kr...@gmail.com> wrote: > Yeah I agree, this is a problem. > > The issue is that a produce request which is either in the network buffer > or in the request processing queue on the broker may still be processed > after a disconnect. So there is a race condition between that processing > and the reconnect/retry logic. You could work around this in a hacky way > using the reconnect backoff time, but the fundamental race condition > exists. We could easily make this more transparent by having some mode > where disconnection throws an error back to the client, but in fact there > is no way for the client to solve this either. > > Neither Storm nor Samza nor any other framework would actually fix this > issue for you, since they are in turn dependent on Kafka's ordering (though > they might solve a lot of other problems). > > As Jun mentions we have been thinking of having a per-producer sequence > number to enforce ordering. This would allow us to make produce calls > idempotent, enforce strong ordering in the case of retries, as well as fix > a number of other corner cases. I think it would handle this issue as well. > But it's not a quick patch. > > I will try to get a design proposal up by next week so we have something > concrete to discuss. > > -Jay > > > On Thu, Aug 22, 2013 at 9:32 PM, Ross Black <ross.w.bl...@gmail.com> > wrote: > > > Hi, > > > > I am using Kafka 0.7.1, and using the low-level SyncProducer to send > > messages to a *single* partition from a *single* thread. > > The client sends messages that contain sequential numbers so it is > obvious > > at the consumer when message order is shuffled. > > I have noticed that messages can be saved out-or-order by Kafka when > there > > are connection problems, and am looking for possible solutions (I think I > > already know the cause). > > > > The client sends messages in a retry loop so that it will wait for a > short > > period and then retry to send on any IO errors. In SyncProducer, any > > IOException triggers a disconnect. Next time send is called a new > > connection is established. I believe that it is this > disconnect/reconnect > > cycle that can cause messages to be saved to the kafka log in a different > > order to that of the client. > > > > I had previously had the same sort of issue with reconnect.interval/time, > > which was fixed by disabling those reconnect settings. > > > > > http://mail-archives.apache.org/mod_mbox/kafka-users/201305.mbox/%3CCAM%2BbZhjssxmUhn_L%3Do0bGsD7PAXFGQHRpOKABcLz29vF3cNOzA%40mail.gmail.com%3E > > > > Is there anything in 0.7 that would allow me to solve this problem? The > > only option I can see at the moment is to not perform retries. > > > > Does 0.8 handle this issue any differently? > > > > Thanks, > > Ross > > >