Re: Incomplete Messages

Jay Kreps Thu, 25 Jun 2015 15:01:18 -0700

The documentation currently includes the caveat that "As an optimization
the server is allowed to return a partial message at the end of the message
set. Clients should handle this case."

As to whether or not this is a good feature: it is definitely a bad
feature. It was originally there because we had no logical offsets and no
indexing scheme and so there really was no way to tell.

But we could actually fix it now. Currently we use the index to translate
logical offsets to physical file positions. The read path in pseudo code
does something like
   read(translate_offset_to_position(fetch_offset), min(max_size,
translate_offset_to_position(highwatermark))
So in fact we give a well-delimited chunk if you are caught up and only
when there are more messages than fit in your fetch max_size do you get
partitial messages. It would take some refactoring in the log layer but
since we already do two offset translations per fetch most of the time I
think we could make the second translation somehow take the size limit into
account too and only stop on a complete message boundary. The trick would
be to do this in a way that makes the code better not worse.

-jay

On Thu, Jun 25, 2015 at 2:32 PM, Joel Koshy <jjkosh...@gmail.com> wrote:

> Yes that is a bit of a caveat in using zero-copy when sending
> FetchResponses. i.e., the broker cannot introspect the message-set and
> lop off any trailing piece. I think this is just something that needs
> to be documented clearly on that wiki. So there is some overhead for
> the client implementation in that the implementation has to interpret
> that in conjunction with MaxBytes.
>
> Thanks,
>
> Joel
>
> On Thu, Jun 25, 2015 at 02:24:32PM -0500, Grant Henke wrote:
> > The fetch/message protocol described here
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchResponse
> >
> > shows
> > that the MessageSet returned from a FetchRequest should be structured
> as:
> >
> > MessageSet => [Offset MessageSize Message]
> >   Offset => int64
> >   MessageSize => int32
> >
> >
> > However, it looks like the Kafka broker can actually return an incomplete
> > message for the last message in the set due to the size set in MaxBytes.
> In
> > the returned MessageSet the last MessageSize returned is larger than the
> > total remaining bytes. It looks like an incomplete value can be returned
> > for any part of the message set (Offset and MessageSize) too, though I
> have
> > not confirmed.
> >
> > This means the MessageSize can't be trusted and the protocol is not
> > actually correct. Is this expected behavior? If expected, is it good
> > behavior?
> >
> > --
> > Grant Henke
> > Solutions Consultant | Cloudera
> > ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>
>

Re: Incomplete Messages

Reply via email to