[ https://issues.apache.org/jira/browse/KAFKA-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898039#comment-17898039 ]
Teddy Yan edited comment on KAFKA-9613 at 11/13/24 8:54 PM: ------------------------------------------------------------ The replica got the following errors. Yes, it becomes an out-of-sync replica. We have a `min.insync.replicas` on the topic (around 2 hours), but we must wait 2 hours to move on. Can Kafka skip the wrong records if we allow to lose some data? ``` [2024-11-13 20:09:26,028] ERROR [ReplicaFetcher replicaId=334945090, leaderId=191572054, fetcherId=0] Error for partition df-flow-3 at offset 185972321 (kafka.server.ReplicaFetcherThread) org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted topic, or is otherwise corrupt. ``` I tried to delete records to skip the error log on the error disk. But I got a timeout error. I don't know why it's a timeout. If there is a slight offset, it won't time out. We can't delete the wrong records (the log messed up), but retention can. Do we have a way to do retention immediately? Or a command that makes Kafka give up the current log file and switch to a new one. It's easy to reproduce. Break the log using the following command. ``` root@a1:/home/support# \{ printf '\x00'; tail -c +0 00000000000185492119.log; } > 00000000000185492119.log ``` !image-2024-11-13-14-02-45-768.png! was (Author: JIRAUSER307572): The replica got the following errors. Yes, it becomes an out-of-sync replica. We have a `min.insync.replicas` on the topic (around 2 hours), but we must wait 2 hours to move on. Can Kafka skip the wrong records if we allow to lose some data? ``` [2024-11-13 20:09:26,028] ERROR [ReplicaFetcher replicaId=334945090, leaderId=191572054, fetcherId=0] Error for partition df-flow-3 at offset 185972321 (kafka.server.ReplicaFetcherThread) org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted topic, or is otherwise corrupt. ``` I tried to delete records to skip the error log on the error disk. But I got a timeout error. I don't know why it's a timeout. If there is a slight offset, it won't time out. We can't delete the wrong records (the log messed up), but retention can. Do we have a way to do retention immediately? It's easy to reproduce. Break the log using the following command. ``` root@a1:/home/support# \{ printf '\x00'; tail -c +0 00000000000185492119.log; } > 00000000000185492119.log ``` !image-2024-11-13-14-02-45-768.png! > CorruptRecordException: Found record size 0 smaller than minimum record > overhead > -------------------------------------------------------------------------------- > > Key: KAFKA-9613 > URL: https://issues.apache.org/jira/browse/KAFKA-9613 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.6.2 > Reporter: Amit Khandelwal > Assignee: hudeqi > Priority: Major > Attachments: image-2024-11-13-14-02-45-768.png > > > 20200224;21:01:38: [2020-02-24 21:01:38,615] ERROR [ReplicaManager broker=0] > Error processing fetch with max size 1048576 from consumer on partition > SANDBOX.BROKER.NEWORDER-0: (fetchOffset=211886, logStartOffset=-1, > maxBytes=1048576, currentLeaderEpoch=Optional.empty) > (kafka.server.ReplicaManager) > 20200224;21:01:38: org.apache.kafka.common.errors.CorruptRecordException: > Found record size 0 smaller than minimum record overhead (14) in file > /data/tmp/kafka-topic-logs/SANDBOX.BROKER.NEWORDER-0/00000000000000000000.log. > 20200224;21:05:48: [2020-02-24 21:05:48,711] INFO [GroupMetadataManager > brokerId=0] Removed 0 expired offsets in 1 milliseconds. > (kafka.coordinator.group.GroupMetadataManager) > 20200224;21:10:22: [2020-02-24 21:10:22,204] INFO [GroupCoordinator 0]: > Member > xxxxxxxx_011-9e61d2c9-ce5a-4231-bda1-f04e6c260dc0-StreamThread-1-consumer-27768816-ee87-498f-8896-191912282d4f > in group yyyyyyyyy_011 has failed, removing it from the group > (kafka.coordinator.group.GroupCoordinator) > > [https://stackoverflow.com/questions/60404510/kafka-broker-issue-replica-manager-with-max-size#] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)