I should also mention that this error was seen on broker version 0.10.1.1.
I found that this condition sounds somewhat similar to KAFKA-4362
<https://issues.apache.org/jira/browse/KAFKA-4362>, but that issue was
submitted in 0.10.1.1 so they appear to be different issues.

On Wed, Mar 15, 2017 at 11:11 AM, Robert Quinlivan <rquinli...@signal.co>
wrote:

> Good morning,
>
> I'm hoping for some help understanding the expected behavior for an offset
> commit request and why this request might fail on the broker.
>
> *Context:*
>
> For context, my configuration looks like this:
>
>    - Three brokers
>    - Consumer offsets topic replication factor set to 3
>    - Auto commit enabled
>    - The user application topic, which I will call "my_topic", has a
>    replication factor of 3 as well and 800 partitions
>    - 4000 consumers attached in consumer group "my_group"
>
>
> *Issue:*
>
> When I attach the consumers, the coordinator logs the following error
> message repeatedly for each generation:
>
> ERROR [Group Metadata Manager on Broker 0]: Appending metadata message for
> group my_group generation 2066 failed due to org.apache.kafka.common.
> errors.RecordTooLargeException, returning UNKNOWN error code to the
> client (kafka.coordinator.GroupMetadataManager)
>
> *Observed behavior:*
>
> The consumer group does not stay connected long enough to consume
> messages. It is effectively stuck in a rebalance loop and the "my_topic"
> data has become unavailable.
>
>
> *Investigation:*
>
> Following the Group Metadata Manager code, it looks like the broker is
> writing to a cache after it writes an Offset Commit Request to the log
> file. If this cache write fails, the broker then logs this error and
> returns an error code in the response. In this case, the error from the
> cache is MESSAGE_TOO_LARGE, which is logged as a RecordTooLargeException.
> However, the broker then sets the error code to UNKNOWN on the Offset
> Commit Response.
>
> It seems that the issue is the size of the metadata in the Offset Commit
> Request. I have the following questions:
>
>    1. What is the size limit for this request? Are we exceeding the size
>    which is causing this request to fail?
>    2. If this is an issue with metadata size, what would cause abnormally
>    large metadata?
>    3. How is this cache used within the broker?
>
>
> Thanks in advance for any insights you can provide.
>
> Regards,
> Robert Quinlivan
> Software Engineer, Signal
>



-- 
Robert Quinlivan
Software Engineer, Signal

Reply via email to