morningman opened a new issue #3980: URL: https://github.com/apache/incubator-doris/issues/3980
**Describe the bug** We found that routine load is not running, and some latest routine load task's transaction state is COMMITTED, not VISIBLE for a long time. And also, the related partition's visible version in Master FE is lower than other FEs. I found that the "transaction visible edit log" has been written successfully. https://github.com/apache/incubator-doris/blob/3ac459f0cac7fa8d7e065dd2c929a05933201e9a/fe/src/main/java/org/apache/doris/transaction/DatabaseTransactionMgr.java#L755-L775 So I am pretty sure that `unprotectUpsertTransactionState(transactionState, false);` in line 761 has been called, but `updateCatalogAfterVisible(transactionState, db);` in line 772 has not been called. The problem is `transactionState.afterStateTransform(TransactionStatus.VISIBLE, txnOperated);` in line 770 throw the exception: ``` java.util.NoSuchElementException: No value present at java.util.Optional.get(Optional.java:135) ~[?:1.8.0_161] at org.apache.doris.load.routineload.RoutineLoadJob.afterVisible(RoutineLoadJob.java:806) ~[palo-fe.jar:?] at org.apache.doris.transaction.TransactionState.afterStateTransform(TransactionState.java:409) ~[palo-fe.jar:?] at org.apache.doris.transaction.TransactionState.afterStateTransform(TransactionState.java:392) ~[palo-fe.jar:?] at org.apache.doris.transaction.DatabaseTransactionMgr.finishTransaction(DatabaseTransactionMgr.java:762) ~[palo-fe.jar:?] at org.apache.doris.transaction.GlobalTransactionMgr.finishTransaction(GlobalTransactionMgr.java:224) ~[palo-fe.jar:?] at org.apache.doris.transaction.PublishVersionDaemon.publishVersion(PublishVersionDaemon.java:208) ~[palo-fe.jar:?] at org.apache.doris.transaction.PublishVersionDaemon.runAfterCatalogReady(PublishVersionDaemon.java:55) [palo-fe.jar:?] at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:?] at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:?] ``` **Debug** The following case cause the bug: 1. A transaction of a routine load task has been COMMITTED but not VISIBLE. 2. Routine load job has been PAUSE for some reason, and then be started again. 3. When job being paused, it will clean the `routineLoadTaskInfoList` of the job. 4. The task is finally VISIBLE and calling the `afterVisible()` callback, but it can't find task in `routineLoadTaskInfoList` because it has been cleared before. So exception throws. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
