Hi
I have observed a peculiar scenario in production environment in which a mapper task for a particular topic-partition combination always fails with the exception 'Task attempt failed to report status for 600 seconds'.

When I dug deep I found it stucks at either fetch() method/getNext method of Kafkareader.

Things which I tried:
-------------------------
1. Network and /etc/hosts entries are checked. They are fine.
2. Machine on which that particular partition resides, there are another partition as well and there is no problem in reading those partitions. So it is not machine specific or network specific issue.
3. Tried increasing timeout parameters and changing buffering parameters.
4. Records are zlib compressed. I tried Kafka console-consumer but couldn't verify with it as data was large.

Here are relevant configs:
-----------------------------------
kafka.client.name=camus1
# Fetch Request Parameters
kafka.fetch.buffer.size=1048576
#kafka.fetch.request.correlationid=
kafka.fetch.request.max.wait=100000
#kafka.fetch.request.min.bytes=
socket.receive.buffer.bytes=1048576
fetch.message.max.bytes=10485760
# Connection parameters.
kafka.brokers=<list of ips>
kafka.timeout.value=30000

Reply via email to