Hi Jay, Jun,

Thanks for your comments - you have confirmed what I thought was most
likely the case.
I will attempt to work around the issue for the moment in the client to
minimise the chance of the out-of-order problem occurring (probably by
stopping retries and triggering a fail-fast of the JVM so that by the time
it restarts there is little chance of pending requests on the prior
connection).

I look forward to seeing a design proposal.

Thanks,
Ross



On 24 August 2013 01:34, Jay Kreps <jay.kr...@gmail.com> wrote:

> Yeah I agree, this is a problem.
>
> The issue is that a produce request which is either in the network buffer
> or in the request processing queue on the broker may still be processed
> after a disconnect. So there is a race condition between that processing
> and the reconnect/retry logic. You could work around this in a hacky way
> using the reconnect backoff time, but the fundamental race condition
> exists. We could easily make this more transparent by having some mode
> where disconnection throws an error back to the client, but in fact there
> is no way for the client to solve this either.
>
> Neither Storm nor Samza nor any other framework would actually fix this
> issue for you, since they are in turn dependent on Kafka's ordering (though
> they might solve a lot of other problems).
>
> As Jun mentions we have been thinking of having a per-producer sequence
> number to enforce ordering. This would allow us to make produce calls
> idempotent, enforce strong ordering in the case of retries, as well as fix
> a number of other corner cases. I think it would handle this issue as well.
> But it's not a quick patch.
>
> I will try to get a design proposal up by next week so we have something
> concrete to discuss.
>
> -Jay
>
>
> On Thu, Aug 22, 2013 at 9:32 PM, Ross Black <ross.w.bl...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am using Kafka 0.7.1, and using the low-level SyncProducer to send
> > messages to a *single* partition from a *single* thread.
> > The client sends messages that contain sequential numbers so it is
> obvious
> > at the consumer when message order is shuffled.
> > I have noticed that messages can be saved out-or-order by Kafka when
> there
> > are connection problems, and am looking for possible solutions (I think I
> > already know the cause).
> >
> > The client sends messages in a retry loop so that it will wait for a
> short
> > period and then retry to send on any IO errors.  In SyncProducer, any
> > IOException triggers a disconnect.  Next time send is called a new
> > connection is established.  I believe that it is this
> disconnect/reconnect
> > cycle that can cause messages to be saved to the kafka log in a different
> > order to that of the client.
> >
> > I had previously had the same sort of issue with reconnect.interval/time,
> > which was fixed by disabling those reconnect settings.
> >
> >
> http://mail-archives.apache.org/mod_mbox/kafka-users/201305.mbox/%3CCAM%2BbZhjssxmUhn_L%3Do0bGsD7PAXFGQHRpOKABcLz29vF3cNOzA%40mail.gmail.com%3E
> >
> > Is there anything in 0.7 that would allow me to solve this problem?  The
> > only option I can see at the moment is to not perform retries.
> >
> > Does 0.8 handle this issue any differently?
> >
> > Thanks,
> > Ross
> >
>

Reply via email to