If you hard-kill (kill -9) a broker, it will do log validation and recovery
(by truncating the segment off from the first invalid message), but only on
the last segment. If you have corruption in earlier segments, the simplest
way is to skip that segment by manually setting the consumer offset to the
offset of the next segment.

Thanks,

Jun


On Tue, May 7, 2013 at 11:17 AM, Todd Bilsborrow <
tbilsbor...@rhythmnewmedia.com> wrote:

> Are there any recommended steps to take to try and recover a corrupt log
> file?
>
> I'm running Kafka 0.7.0, using java apis for both production and
> consumption. If I attempt to read a message from a certain offset using the
> simple consumer, I get the following on the client:
>
> java.io.EOFException: Received -1 when reading from channel, socket has
> likely been closed.
> at kafka.utils.Utils$.read(Utils.scala:486)
> at
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:67)
> at kafka.network.Receive$class.readCompletely(Transmission.scala:57)
> at
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
> at kafka.consumer.SimpleConsumer.getResponse(SimpleConsumer.scala:184)
> at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:98)
> at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:88)
> at kafka.javaapi.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:43)
>
> and the following on the broker:
>
> ERROR Closing socket for /xx.xx.xx.xx because of error
> (kafka.network.Processor)
> java.io.IOException: Input/output error
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>         at
> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:405)
>         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:506)
>         at kafka.message.FileMessageSet.writeTo(FileMessageSet.scala:107)
>         at kafka.server.MessageSetSend.writeTo(MessageSetSend.scala:51)
>         at kafka.network.Processor.write(SocketServer.scala:332)
>         at kafka.network.Processor.run(SocketServer.scala:209)
>         at java.lang.Thread.run(Thread.java:662)
>
> When I run DumpLogSegments on the file, it prints all messages up to the
> seemingly corrupt offset, then pauses for several seconds, then exits with
> the message "tail of the log is at offset: 152722143050" - which is the
> offset that appears to be the start of the corruption. My next log file
> starts at offset 153008674335, so there are a couple hundred MB (~couple
> million messages) that I can't access.
>
> Just curious if there are any next "best practice" steps.
>

Reply via email to