This seems to be related to https://issues.apache.org/jira/browse/KAFKA-1749 .
Guozhang On Tue, Nov 4, 2014 at 10:30 AM, Christofer Hedbrandh < christo...@knewton.com> wrote: > Hi Kafka users! > > I was just migrating a cluster of 3 brokers from one set of EC2 instances > to another, but ran into replication problems. The method of migration used > is that of stopping one broker and letting a new broker join with the same > broker.id. Replication started, but after ~4 of ~15 GB the process stopped > with the following errors getting logged every ~500ms. > > > On the new broker (the fetcher): > > [2014-11-04 17:02:33,762] ERROR [ReplicaFetcherThread-0-1926078608], Error > in fetch Name: FetchRequest; Version: 0; CorrelationId: 1523; ClientId: > ReplicaFetcherThread-0-1926078608; ReplicaId: 544181083; MaxWait: 500 ms; > MinBytes: 1 bytes; RequestInfo: [qa.mx-error,302] -> > PartitionFetchInfo(0,10485760),[qa.xl-msg,46] -> > PartitionFetchInfo(101768,10485760),[qa.xl-error,202] -> > PartitionFetchInfo(0,10485760),[qa.mx-msg,177] -> > ... total of 700+ partitions > -> PartitionFetchInfo(0,10485760) (kafka.server.ReplicaFetcherThread) > java.io.EOFException: Received -1 when reading from channel, socket has > likely been closed. > at kafka.utils.Utils$.read(Utils.scala:376) > at > > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > at > kafka.network.Receive$class.readCompletely(Transmission.scala:56) > at > > kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) > at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) > at > kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81) > at > > kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71) > at > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109) > at > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109) > at > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108) > at > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108) > at > > kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107) > at > > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > [2014-11-04 17:02:33,765] WARN Reconnect due to socket error: null > (kafka.consumer.SimpleConsumer) > > > On one of the two old nodes (presumably the broker providing the data) > > [2014-11-04 17:03:28,030] ERROR Closing socket for /10.145.135.246 because > of error (kafka.network.Processor) > kafka.common.KafkaException: This operation cannot be completed on a > complete request. > at kafka.network.Transmission$class.expectIncomplete(Transmission.scala:34) > at kafka.api.FetchResponseSend.expectIncomplete(FetchResponse.scala:191) > at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:214) > at kafka.network.Processor.write(SocketServer.scala:375) > at kafka.network.Processor.run(SocketServer.scala:247) > at java.lang.Thread.run(Thread.java:745) > > > It looks similar to this previous post, but the thread doesn't seem to have > a resolution to the problem. > http://thread.gmane.org/gmane.comp.apache.kafka.user/1153 > > There is also this one, but again no resolution. > http://thread.gmane.org/gmane.comp.apache.kafka.user/3804 > > > Does anyone have any clues as to what might be going on here? And any > suggestions for solutions? > > Thanks, > Christofer > -- -- Guozhang