[ 
https://issues.apache.org/jira/browse/KAFKA-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898039#comment-17898039
 ] 

Teddy Yan edited comment on KAFKA-9613 at 11/13/24 8:54 PM:
------------------------------------------------------------

The replica got the following errors. Yes, it becomes an out-of-sync replica. 
We have a `min.insync.replicas` on the topic (around 2 hours), but we must wait 
2 hours to move on.  Can Kafka skip the wrong records if we allow to lose some 
data? 

```

[2024-11-13 20:09:26,028] ERROR [ReplicaFetcher replicaId=334945090, 
leaderId=191572054, fetcherId=0] Error for partition df-flow-3 at offset 
185972321 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.errors.CorruptRecordException: This message has failed 
its CRC checksum, exceeds the valid size, has a null key for a compacted topic, 
or is otherwise corrupt.

```

I tried to delete records to skip the error log on the error disk. But I got a 
timeout error. I don't know why it's a timeout. If there is a slight offset, it 
won't time out. We can't delete the wrong records (the log messed up), but 
retention can. Do we have a way to do retention immediately? Or a command that 
makes Kafka give up the current log file and switch to a new one.

 

It's easy to reproduce. Break the log using the following command.
```

root@a1:/home/support# \{ printf '\x00'; tail -c +0 00000000000185492119.log; } 
> 00000000000185492119.log

```

!image-2024-11-13-14-02-45-768.png!


was (Author: JIRAUSER307572):
The replica got the following errors. Yes, it becomes an out-of-sync replica. 
We have a `min.insync.replicas` on the topic (around 2 hours), but we must wait 
2 hours to move on.  Can Kafka skip the wrong records if we allow to lose some 
data? 

```

[2024-11-13 20:09:26,028] ERROR [ReplicaFetcher replicaId=334945090, 
leaderId=191572054, fetcherId=0] Error for partition df-flow-3 at offset 
185972321 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.errors.CorruptRecordException: This message has failed 
its CRC checksum, exceeds the valid size, has a null key for a compacted topic, 
or is otherwise corrupt.

```

I tried to delete records to skip the error log on the error disk. But I got a 
timeout error. I don't know why it's a timeout. If there is a slight offset, it 
won't time out. We can't delete the wrong records (the log messed up), but 
retention can. Do we have a way to do retention immediately?

 

It's easy to reproduce. Break the log using the following command.
```

root@a1:/home/support# \{ printf '\x00'; tail -c +0 00000000000185492119.log; } 
> 00000000000185492119.log

```

!image-2024-11-13-14-02-45-768.png!

> CorruptRecordException: Found record size 0 smaller than minimum record 
> overhead
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-9613
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9613
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.6.2
>            Reporter: Amit Khandelwal
>            Assignee: hudeqi
>            Priority: Major
>         Attachments: image-2024-11-13-14-02-45-768.png
>
>
> 20200224;21:01:38: [2020-02-24 21:01:38,615] ERROR [ReplicaManager broker=0] 
> Error processing fetch with max size 1048576 from consumer on partition 
> SANDBOX.BROKER.NEWORDER-0: (fetchOffset=211886, logStartOffset=-1, 
> maxBytes=1048576, currentLeaderEpoch=Optional.empty) 
> (kafka.server.ReplicaManager)
> 20200224;21:01:38: org.apache.kafka.common.errors.CorruptRecordException: 
> Found record size 0 smaller than minimum record overhead (14) in file 
> /data/tmp/kafka-topic-logs/SANDBOX.BROKER.NEWORDER-0/00000000000000000000.log.
> 20200224;21:05:48: [2020-02-24 21:05:48,711] INFO [GroupMetadataManager 
> brokerId=0] Removed 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)
> 20200224;21:10:22: [2020-02-24 21:10:22,204] INFO [GroupCoordinator 0]: 
> Member 
> xxxxxxxx_011-9e61d2c9-ce5a-4231-bda1-f04e6c260dc0-StreamThread-1-consumer-27768816-ee87-498f-8896-191912282d4f
>  in group yyyyyyyyy_011 has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
>  
> [https://stackoverflow.com/questions/60404510/kafka-broker-issue-replica-manager-with-max-size#]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to