Hi Kafka users!

I was just migrating a cluster of 3 brokers from one set of EC2 instances
to another, but ran into replication problems. The method of migration used
is that of stopping one broker and letting a new broker join with the same
broker.id. Replication started, but after ~4 of ~15 GB the process stopped
with the following errors getting logged every ~500ms.


On the new broker (the fetcher):

[2014-11-04 17:02:33,762] ERROR [ReplicaFetcherThread-0-1926078608], Error
in fetch Name: FetchRequest; Version: 0; CorrelationId: 1523; ClientId:
ReplicaFetcherThread-0-1926078608; ReplicaId: 544181083; MaxWait: 500 ms;
MinBytes: 1 bytes; RequestInfo: [qa.mx-error,302] ->
PartitionFetchInfo(0,10485760),[qa.xl-msg,46] ->
PartitionFetchInfo(101768,10485760),[qa.xl-error,202] ->
PartitionFetchInfo(0,10485760),[qa.mx-msg,177] ->
... total of 700+ partitions
-> PartitionFetchInfo(0,10485760) (kafka.server.ReplicaFetcherThread)
java.io.EOFException: Received -1 when reading from channel, socket has
likely been closed.
        at kafka.utils.Utils$.read(Utils.scala:376)
        at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
        at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
        at
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
        at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
        at
kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)
        at
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
        at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
        at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
[2014-11-04 17:02:33,765] WARN Reconnect due to socket error: null
(kafka.consumer.SimpleConsumer)


On one of the two old nodes (presumably the broker providing the data)

[2014-11-04 17:03:28,030] ERROR Closing socket for /10.145.135.246 because
of error (kafka.network.Processor)
kafka.common.KafkaException: This operation cannot be completed on a
complete request.
at kafka.network.Transmission$class.expectIncomplete(Transmission.scala:34)
at kafka.api.FetchResponseSend.expectIncomplete(FetchResponse.scala:191)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:214)
at kafka.network.Processor.write(SocketServer.scala:375)
at kafka.network.Processor.run(SocketServer.scala:247)
at java.lang.Thread.run(Thread.java:745)


It looks similar to this previous post, but the thread doesn't seem to have
a resolution to the problem.
http://thread.gmane.org/gmane.comp.apache.kafka.user/1153

There is also this one, but again no resolution.
http://thread.gmane.org/gmane.comp.apache.kafka.user/3804


Does anyone have any clues as to what might be going on here? And any
suggestions for solutions?

Thanks,
Christofer

Reply via email to