hi there we're running kafka cluster with 10 brokers and two topics, each topics has 500 partitions(kafka version is 0.8.2.1), when we start a hadoop job to fetch message from cluster(one hadoop map for one partition), 499/500 were successed, only one task fail. And the error on that broker is like this(I found it in server.log):
[2015-10-20 19:00:02,996] ERROR [Replica Manager on Broker 10]: Error when processing fetch request for partition [ualog,54] offset 109455388 from consumer with correlation id 0. Possible cause: Failed to read complete buffer for targetOffset 110982779 startPosition 2147469523 in /disk1/data/ualog-54/00000000000106895524.log (kafka.server.ReplicaManager) My code will keeping try to fetch the specificed offset message until hadoop task timeout , and of course the broker keep give me this error log but nothing changes. I have no idea what's that means. So I hope you guys can help me out here. thanks