[ 
https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558265#comment-16558265
 ] 

lambdaliu commented on KAFKA-7190:
----------------------------------

Hello [~guozhang]. My team developed a cloud version Kafka and I am familiar 
with the broker. So I think probably I can solve this issue.

When we remove the head of the log, we take the bellowing steps in 
ProducerStateManager.truncateHead :

1. clean producerId whose last offset smaller than log start offset
2. remove procducerId's BatchMetadata which have a last offset smaller than log 
start offset
3. remove ongoing transaction whose producerId remove in step 1.
4. remove unreplicated transaction whose last offset smaller than log start 
offset
5. update lastMapOffset to log start offset if lasterMapoffset is smaller than 
log start offset
6. delete snapshot file older than the new log start offset

As you suggested, we can delay the deletion of producer ID until it expired. We 
can also delay the step 2 and step 3 to that time.

For the old snapshot file in step 6, we can rely on the period called function 
deleteSnapshotsAfterRecoveryPointCheckpoint to delete it. And when loading 
producer state from snapshot file we may not drop the producerId whose last 
offset smaller than log start offset. 

So we just need do step 4 and step 5 when remove the head of the log.

For the additional PID expiration config, is there any reason to add it? if it 
is reasonable, I will add it.

> Under low traffic conditions purging repartition topics cause WARN statements 
> about  UNKNOWN_PRODUCER_ID 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7190
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7190
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core, streams
>    Affects Versions: 1.1.0, 1.1.1
>            Reporter: Bill Bejeck
>            Assignee: lambdaliu
>            Priority: Major
>
> When a streams application has little traffic, then it is possible that 
> consumer purging would delete
> even the last message sent by a producer (i.e., all the messages sent by
> this producer have been consumed and committed), and as a result, the broker
> would delete that producer's ID. The next time when this producer tries to
> send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case,
> this error is retriable: the producer would just get a new producer id and
> retries, and then this time it will succeed. 
>  
> Possible fixes could be on the broker side, i.e., delaying the deletion of 
> the produderIDs for a more extended period or on the streams side developing 
> a more conservative approach to deleting offsets from repartition topics
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to