Hi,

We recently had a kafka node go down suddenly. When it came back up, it
apparently had a corrupt recovery file, and refused to startup:

2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error
starting up KafkaServer
java.lang.NumberFormatException: For input string:
"^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
        at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:481)
        at java.lang.Integer.parseInt(Integer.java:527)
        at 
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
        at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
        at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
        at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
        at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
        at kafka.log.LogManager.loadLogs(LogManager.scala:105)
        at kafka.log.LogManager.<init>(LogManager.scala:57)
        at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:72)

And since the app is under a monitor (so it was repeatedly restarting and
failing with this error for several minutes before we got to it)…

We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it
then restarted cleanly (but of course re-synced all it’s data from
replicas, so we had no data loss).

Anyway, I’m wondering if that’s the expected behavior? Or should it not
declare it corrupt and then proceed automatically to an unclean restart?

Should this NumberFormatException be handled a bit more gracefully?

We saved the corrupt file if it’s worth inspecting (although I doubt it
will be useful!)….

Jason
​

Reply via email to