We've been running Kafka 0.7.0 in production for several months and have been 
quite happy. Our use case to date has been to pull from the head of our topics, 
so we're normally consuming within seconds of message production using the high 
level consumer which is working great as far as I can tell.

Recently I've started pulling older data (usually a few hours old) using the 
low level consumer, and I'm running into what appears to be corruption in the 
data files. The consumer pauses for several seconds and then throws 
"java.io.EOFException: Received -1 when reading from channel, socket has likely 
been closed". The server log shows "ERROR Closing socket for /xx.xx.xx.xx 
because of error (kafka.network.Processor) java.io.IOException: Input/output 
error". And DumpLogSegments reads up to the problematic offset and then stops 
and reports that the tail of the log is at offset: <bad offset> even though 
there is more data in the file (the next segment file's starting offset is much 
higher).

I learned here on the mailing list last month that I can skip the rest of the 
corrupted segment, but I'd rather not be doing that because then I'm losing 
messages. This has happened 5-6 times in the past month; I've seen it on 
different brokers, different topics, different partitions, and different 
segments.

So finally, my questions are:

- is anyone else pulling older data without issues, or is everyone pretty much 
always consuming as fast as possible?
- is there a known bug that would be fixed with an upgrade to a newer kafka 
version? I don't know if it's the same problem but I see jira 309 and 310 are 
marked as fixed but I don't know in which version
- is there any way to examine a corrupt file to see what went wrong? Or any way 
to diagnose why it's happening?

Thanks!

Reply via email to