morningman opened a new issue #3980:
URL: https://github.com/apache/incubator-doris/issues/3980


   **Describe the bug**
   We found that routine load is not running, and some latest routine load 
task's transaction state is COMMITTED, not VISIBLE
   for a long time.
   
   And also, the related partition's visible version in Master FE is lower than 
other FEs.
   I found that the "transaction visible edit log" has been written 
successfully.
   
   
https://github.com/apache/incubator-doris/blob/3ac459f0cac7fa8d7e065dd2c929a05933201e9a/fe/src/main/java/org/apache/doris/transaction/DatabaseTransactionMgr.java#L755-L775
   
   So I am pretty sure that `unprotectUpsertTransactionState(transactionState, 
false);` in line 761 has been called, 
   but `updateCatalogAfterVisible(transactionState, db);` in line 772 has not 
been called.
   
   The problem is 
`transactionState.afterStateTransform(TransactionStatus.VISIBLE, txnOperated);` 
in line 770 throw the exception:
   
   ```
   java.util.NoSuchElementException: No value present
           at java.util.Optional.get(Optional.java:135) ~[?:1.8.0_161]
           at 
org.apache.doris.load.routineload.RoutineLoadJob.afterVisible(RoutineLoadJob.java:806)
 ~[palo-fe.jar:?]
           at 
org.apache.doris.transaction.TransactionState.afterStateTransform(TransactionState.java:409)
 ~[palo-fe.jar:?]
           at 
org.apache.doris.transaction.TransactionState.afterStateTransform(TransactionState.java:392)
 ~[palo-fe.jar:?]
           at 
org.apache.doris.transaction.DatabaseTransactionMgr.finishTransaction(DatabaseTransactionMgr.java:762)
 ~[palo-fe.jar:?]
           at 
org.apache.doris.transaction.GlobalTransactionMgr.finishTransaction(GlobalTransactionMgr.java:224)
 ~[palo-fe.jar:?]
           at 
org.apache.doris.transaction.PublishVersionDaemon.publishVersion(PublishVersionDaemon.java:208)
 ~[palo-fe.jar:?]
           at 
org.apache.doris.transaction.PublishVersionDaemon.runAfterCatalogReady(PublishVersionDaemon.java:55)
 [palo-fe.jar:?]
           at 
org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) 
[palo-fe.jar:?]
           at org.apache.doris.common.util.Daemon.run(Daemon.java:116) 
[palo-fe.jar:?]
   ```
   
   **Debug**
   
   The following case cause the bug:
   
   1. A transaction of a routine load task has been COMMITTED but not VISIBLE.
   2. Routine load job has been PAUSE for some reason, and then be started 
again.
   3. When job being paused, it will clean the `routineLoadTaskInfoList` of the 
job.
   4. The task is finally VISIBLE and calling the `afterVisible()` callback, 
but it can't find task in `routineLoadTaskInfoList` because it has been cleared 
before. So exception throws.
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to