[ https://issues.apache.org/jira/browse/KAFKA-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Chen updated KAFKA-17428: ------------------------------ Description: Currently, we will delete failed uploaded segment and Custom metadata size exceeded segments in copyLogSegment in RLMCopyTask. But after deletion, these segment states are still in COPY_SEGMENT_STARTED. That "might" cause unexpected issues in the future. We'd better to move the state from {{COPY_SEGMENT_STARTED}} -> {{DELETE_SEGMENT_STARTED}} -> {{DELETE_SEGMENT_FINISHED}} updated: I thought about this when I first had a look at it and one thing that bothered me is that {{DELETE_SEGMENT_STARTED}} means to me that we're now in a state where we attempt deletion. However if the remote store is down and we fail to copy and delete we will leave that segment in {{DELETE_SEGMENT_STARTED}} and not attempt to delete it till the segment itself breaches retention.ms/bytes. We can probably just make it clearer but that was my thought at the time. So, maybe when in deletion loop, we can add {{DELETE_SEGMENT_STARTED}} segments into deletion directly, but that also needs to consider the retention size calculation. was:Currently, we will delete failed uploaded segment and Custom metadata size exceeded segments in copyLogSegment in RLMCopyTask. But after deletion, these segment states are still in COPY_SEGMENT_STARTED. That "might" cause unexpected issues in the future. We'd better to move the state from {{COPY_SEGMENT_STARTED}} -> {{DELETE_SEGMENT_STARTED}} -> {{DELETE_SEGMENT_FINISHED}} > remote segments deleted in RLMCopyTask stays `COPY_SEGMENT_START` state > ----------------------------------------------------------------------- > > Key: KAFKA-17428 > URL: https://issues.apache.org/jira/browse/KAFKA-17428 > Project: Kafka > Issue Type: Improvement > Reporter: Luke Chen > Priority: Major > > Currently, we will delete failed uploaded segment and Custom metadata size > exceeded segments in copyLogSegment in RLMCopyTask. But after deletion, these > segment states are still in COPY_SEGMENT_STARTED. That "might" cause > unexpected issues in the future. We'd better to move the state from > {{COPY_SEGMENT_STARTED}} -> {{DELETE_SEGMENT_STARTED}} -> > {{DELETE_SEGMENT_FINISHED}} > > updated: > I thought about this when I first had a look at it and one thing that > bothered me is that {{DELETE_SEGMENT_STARTED}} means to me that we're now in > a state where we attempt deletion. However if the remote store is down and we > fail to copy and delete we will leave that segment in > {{DELETE_SEGMENT_STARTED}} and not attempt to delete it till the segment > itself breaches retention.ms/bytes. > We can probably just make it clearer but that was my thought at the time. > So, maybe when in deletion loop, we can add {{DELETE_SEGMENT_STARTED}} > segments into deletion directly, but that also needs to consider the > retention size calculation. > -- This message was sent by Atlassian Jira (v8.20.10#820010)