github-actions[bot] commented on PR #61046:
URL: https://github.com/apache/doris/pull/61046#issuecomment-4001683799

   ## Code Review Summary
   
   **PR #61046: [fix](job) fix routine load job incorrectly cancelled on FE 
restart after swap table**
   
   ### Overall Assessment: No issues found. The fix is correct and 
well-targeted.
   
   ### Review Details
   
   **Problem & Root Cause:** The PR correctly identifies that gsonPostProcess() 
re-parses the original CREATE ROUTINE LOAD SQL (origStmt) which contains the 
table name at creation time. After a REPLACE TABLE (swap) + DROP TABLE 
sequence, that original name no longer exists, causing checkDBTable() to throw 
AnalysisException and the job to be incorrectly cancelled.
   
   **Fix Analysis:**
   - The fix resolves the current table name from the catalog using the 
persisted tableId (which is stable across renames/swaps) before calling 
validate(). This is the correct approach since tableId is already properly 
maintained through swap operations.
   - The setTableName() call updates the field used by both checkDBTable() 
(line 434) and checkLoadProperties() (line 508), so both table lookups will use 
the corrected name.
   - Guard conditions (!isMultiTable && tableId != 0) are appropriate - 
multi-table jobs don't have a single target table, and tableId == 0 already 
implies multi-table mode.
   - The ifPresent() pattern correctly handles the case where the table has 
genuinely been deleted - setTableName won't be called, and validate() will 
properly fail with the stale name.
   - The defensive catch (Exception ignored) block is acceptable since getDb() 
and getTable() return Optionals and shouldn't throw, but it provides a safety 
net.
   - DB resolution uses dbId (not name) both in the existing code (line 1978) 
and the new code (line 1995), so DB rename scenarios are also handled correctly.
   
   **Test:** The regression test properly reproduces the exact scenario 
described in the bug: create table, create routine load, swap, drop, restart 
FE, then verify the job state is not CANCELLED. The test uses the 
docker/cluster framework with FE restart, which is the correct approach for 
testing gsonPostProcess() replay behavior.
   
   ### Critical Checkpoints:
   - **Correctness:** The fix correctly addresses the root cause by resolving 
table name from ID before validation. No logical errors found.
   - **Backward compatibility:** The fix only activates when tableId != 0 and a 
valid table is found. Existing behavior is preserved for all other cases.
   - **Thread safety:** gsonPostProcess() runs during single-threaded image 
replay; no concurrency concerns.
   - **Error handling:** Proper fallback behavior - if table lookup fails for 
any reason, the original validation path still runs.
   - **Test coverage:** Regression test directly reproduces the reported 
scenario.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to