[ https://issues.apache.org/jira/browse/KAFKA-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maksim Larionov updated KAFKA-9604: ----------------------------------- Description: Good day! A disk space overflow occurred on one of the servers in the cluster. During cleaning some partitions *.log files in the log.dirs directory were deleted by mistake. When topic's retention time was reached, file 00000000000007607076.log was not found. The broker stopped with error message. It`s ok. [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base offsets [7607076] due to retention time 604800000ms breach (kafka.log.Log) [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 7607076, size 131228281] for deletion. (kafka.log.Log) [2020-02-06 13:32:48,979] ERROR Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.LogDirFailureChannel) java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving replicas in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.ReplicaManager) [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data Caused by: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted ... [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Then all the other nodes of the cluster stopped abruptly in the election of partition leaders: [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making broker the leader for partition Topic: ocs.counter-balances; Partition: 40; Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager) org.apache.kafka.common.errors.KafkaStorageException: Error while writing to checkpoint file /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint Caused by: java.io.FileNotFoundException: /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp (No such file or directory) [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Is it normal? was: Good day! A disk space overflow occurred on one of the servers in the cluster. During cleaning some partitions *.log files in the log.dirs directory were deleted by mistake. When topic's retention time was reached, file 00000000000007607076.log was not found. The broker stopped abruptly with error message. It`s ok. [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base offsets [7607076] due to retention time 604800000ms breach (kafka.log.Log) [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 7607076, size 131228281] for deletion. (kafka.log.Log) [2020-02-06 13:32:48,979] ERROR Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.LogDirFailureChannel) java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving replicas in dir /data/ocswf/kafka_broker/kafka-data (kafka.server.ReplicaManager) [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data Caused by: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log Suppressed: java.nio.file.NoSuchFileException: /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log -> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted ... [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Then all the other nodes of the cluster stopped abruptly in the election of partition leaders: [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making broker the leader for partition Topic: ocs.counter-balances; Partition: 40; Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager) org.apache.kafka.common.errors.KafkaStorageException: Error while writing to checkpoint file /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint Caused by: java.io.FileNotFoundException: /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp (No such file or directory) [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) Is it normal? > Падение кластера > ---------------- > > Key: KAFKA-9604 > URL: https://issues.apache.org/jira/browse/KAFKA-9604 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.2.1 > Reporter: Maksim Larionov > Priority: Major > > Good day! > A disk space overflow occurred on one of the servers in the cluster. During > cleaning some partitions *.log files in the log.dirs directory were deleted > by mistake. When topic's retention time was reached, file > 00000000000007607076.log was not found. The broker stopped with error > message. It`s ok. > [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, > dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base > offsets [7607076] due to retention time 604800000ms breach (kafka.log.Log) > [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, > dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset > 7607076, size 131228281] for deletion. (kafka.log.Log) > [2020-02-06 13:32:48,979] ERROR Error while deleting segments for > ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data > (kafka.server.LogDirFailureChannel) > java.nio.file.NoSuchFileException: > /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log > Suppressed: java.nio.file.NoSuchFileException: > /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log > -> > /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted > [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving > replicas in dir /data/ocswf/kafka_broker/kafka-data > (kafka.server.ReplicaManager) > [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task > 'kafka-log-retention' (kafka.utils.KafkaScheduler) > org.apache.kafka.common.errors.KafkaStorageException: Error while deleting > segments for ocs.account-balances-12 in dir > /data/ocswf/kafka_broker/kafka-data > Caused by: java.nio.file.NoSuchFileException: > /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log > Suppressed: java.nio.file.NoSuchFileException: > /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log > -> > /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted > ... > [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir > /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) > [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in > /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) > > Then all the other nodes of the cluster stopped abruptly in the election of > partition leaders: > [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making > broker the leader for partition Topic: ocs.counter-balances; Partition: 40; > Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir > Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager) > org.apache.kafka.common.errors.KafkaStorageException: Error while writing to > checkpoint file > /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint > Caused by: java.io.FileNotFoundException: > /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp > (No such file or directory) > [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir > /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager) > [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in > /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager) > Is it normal? > -- This message was sent by Atlassian Jira (v8.3.4#803005)