ByteYue opened a new pull request, #17893:
URL: https://github.com/apache/doris/pull/17893

   # Proposed changes
   
   Issue Number: close #xxx
   Recently we encountered one 
   ## Problem summary
   Recently, we found that in a high-concurrency import scenario, one version 
was consistently missing across all three replicas of a certain tablet. 
However, when we ran grep ${tablet_id} | grep publish | grep missed_version on 
the backend, we could not find the corresponding logs. After checking the 
transaction numbers of the missed_version-1 and missed_version+1, we finally 
identified the transaction number of the missing version. We then used this 
transaction ID to search for logs in the frontend, and found the following:
   
![image](https://user-images.githubusercontent.com/43750022/225830572-37b30bf0-5aad-4bbd-8718-eff6d428134f.png)
   
   The same transaction timed out while attempting to acquire a write lock and 
was aborted, but it was later successfully committed. However, the abort 
transaction was also cleared on the backend by "clear transaction" rpc. As a 
result, the publish task corresponding to this transaction can never succeed.
   
   After reviewing the code related to transactions, it appears that there are 
many places where access to the transactionState is not thread-safe. 
Additionally, even the unprotectedCommitTransaction2PC method can successfully 
commit a transaction without satisfying the required status limitations.
   
   The transaction code does not fully consider duplicate concurrent RPCs, and 
the current code is tightly coupled and difficult to modify. This pull request 
can only attempt to handle the issue with unprotectedCommitTransaction2PC on a 
case-by-case basis, while also adding a read lock to ensure thread-safe access 
to transactionState (although this may not be sufficient).
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to