ByteYue opened a new pull request, #17893: URL: https://github.com/apache/doris/pull/17893
# Proposed changes Issue Number: close #xxx Recently we encountered one ## Problem summary Recently, we found that in a high-concurrency import scenario, one version was consistently missing across all three replicas of a certain tablet. However, when we ran grep ${tablet_id} | grep publish | grep missed_version on the backend, we could not find the corresponding logs. After checking the transaction numbers of the missed_version-1 and missed_version+1, we finally identified the transaction number of the missing version. We then used this transaction ID to search for logs in the frontend, and found the following:  The same transaction timed out while attempting to acquire a write lock and was aborted, but it was later successfully committed. However, the abort transaction was also cleared on the backend by "clear transaction" rpc. As a result, the publish task corresponding to this transaction can never succeed. After reviewing the code related to transactions, it appears that there are many places where access to the transactionState is not thread-safe. Additionally, even the unprotectedCommitTransaction2PC method can successfully commit a transaction without satisfying the required status limitations. The transaction code does not fully consider duplicate concurrent RPCs, and the current code is tightly coupled and difficult to modify. This pull request can only attempt to handle the issue with unprotectedCommitTransaction2PC on a case-by-case basis, while also adding a read lock to ensure thread-safe access to transactionState (although this may not be sufficient). Describe your changes. ## Checklist(Required) * [ ] Does it affect the original behavior * [ ] Has unit tests been added * [ ] Has document been added or modified * [ ] Does it need to update dependencies * [ ] Is this PR support rollback (If NO, please explain WHY) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org