Jarko, Do you have many topic partitions? Currently, if #partitions * fetched_bytes in the response exceeds 2GB, we will get an integer overflow and weird things can happen. We are trying to address this better in KIP-74. If this is the issue, for now, you can try reducing the fetch size or increasing the replica fetch threads to work around the issue.
Thanks, Jun On Wed, Aug 17, 2016 at 3:04 AM, J Mes <jarko...@gmail.com> wrote: > Hello, > > I have a cluster of 3 nodes running kafka v.0.10.0.0. This cluster was > starter about a week ago with no data, no issues starting up. > Today we noticed 1 of the servers in the cluster did not work anymore, we > checked and indeed the server was not working anymore and all data was old. > > We restarted the node without data, thinking it should sync up and then > join the cluster again, but we keep getting the following error: > > [2016-08-17 12:02:23,620] WARN [ReplicaFetcherThread-0-1], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@62b3e70c (kafka.server. > ReplicaFetcherThread) > org.apache.kafka.common.protocol.types.SchemaException: Error reading > field 'responses': Error reading field 'partition_responses': Error reading > field 'record_set': Error reading bytes of size 104856430, only 18764961 > bytes available > at org.apache.kafka.common.protocol.types.Schema.read( > Schema.java:73) > at org.apache.kafka.clients.NetworkClient.parseResponse( > NetworkClient.java:380) > at org.apache.kafka.clients.NetworkClient.handleCompletedReceives( > NetworkClient.java:449) > at org.apache.kafka.clients.NetworkClient.poll( > NetworkClient.java:269) > at kafka.utils.NetworkClientBlockingOps$.recursivePoll$2( > NetworkClientBlockingOps.scala:136) > at kafka.utils.NetworkClientBlockingOps$.kafka$utils$ > NetworkClientBlockingOps$$pollContinuously$extension( > NetworkClientBlockingOps.scala:143) > at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$ > extension(NetworkClientBlockingOps.scala:80) > at kafka.server.ReplicaFetcherThread.sendRequest( > ReplicaFetcherThread.scala:244) > at kafka.server.ReplicaFetcherThread.fetch( > ReplicaFetcherThread.scala:229) > at kafka.server.ReplicaFetcherThread.fetch( > ReplicaFetcherThread.scala:42) > at kafka.server.AbstractFetcherThread.processFetchRequest( > AbstractFetcherThread.scala:107) > at kafka.server.AbstractFetcherThread.doWork( > AbstractFetcherThread.scala:98) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > > All nodes are running the exact same version of zookepeer/kafka. > > When we clear all data from all nodes and start again, everything works... > > Any idea anyone? > > Kr, > Jarko Mesuere