Guozhang Wang created KAFKA-1860:
------------------------------------

             Summary: File system errors are not detected unless Kafka tries to 
write
                 Key: KAFKA-1860
                 URL: https://issues.apache.org/jira/browse/KAFKA-1860
             Project: Kafka
          Issue Type: Bug
            Reporter: Guozhang Wang
            Assignee: Guozhang Wang
             Fix For: 0.9.0


When the disk (raid with caches dir) dies on a Kafka broker, typically the 
filesystem gets mounted into read-only mode, and hence when Kafka tries to read 
the disk, they'll get a FileNotFoundException with the read-only errno set 
(EROFS).

However, as long as there is no produce request received, hence no writes 
attempted on the disks, Kafka will not exit on such FATAL error (when the disk 
starts working again, Kafka might think some files are gone while they will 
reappear later as raid comes back online). Instead it keeps spilling exceptions 
like:

{code}
2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
[kafka-server] [] Uncaught exception in scheduled task 
'kafka-recovery-point-checkpoint'
java.io.FileNotFoundException: 
/export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
(Read-only file system)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:206)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:156)
        at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to