github-actions[bot] commented on code in PR #61389:
URL: https://github.com/apache/doris/pull/61389#discussion_r2939726347


##########
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcSourceOffsetProvider.java:
##########
@@ -535,7 +545,21 @@ private boolean checkNeedSplitChunks(Map<String, String> 
sourceProperties) {
         if (startMode == null) {
             return false;
         }
-        return DataSourceConfigKeys.OFFSET_INITIAL.equalsIgnoreCase(startMode);
+        return DataSourceConfigKeys.OFFSET_INITIAL.equalsIgnoreCase(startMode)
+                || 
DataSourceConfigKeys.OFFSET_SNAPSHOT.equalsIgnoreCase(startMode);
+    }
+
+    private boolean isSnapshotOnlyMode() {
+        String offset = sourceProperties.get(DataSourceConfigKeys.OFFSET);
+        return DataSourceConfigKeys.OFFSET_SNAPSHOT.equalsIgnoreCase(offset);
+    }
+
+    @Override
+    public boolean hasReachedEnd() {
+        return isSnapshotOnlyMode()
+                && currentOffset != null
+                && currentOffset.snapshotSplit()
+                && remainingSplits.isEmpty();

Review Comment:
   **Note:** `hasReachedEnd()` calls `currentOffset.snapshotSplit()` which 
requires non-null, non-empty splits (enforced by `Preconditions.checkState` in 
`JdbcOffset.snapshotSplit()`). After crash recovery for a completed 
snapshot-only job, `currentOffset` has null splits (see comment on 
`replayIfNeed`), so this method will throw `IllegalStateException`.
   
   Consider either:
   1. Adding a null-safe split check here directly: `currentOffset.getSplits() 
!= null && !currentOffset.getSplits().isEmpty() && 
currentOffset.snapshotSplit()`
   2. Or ensuring `replayIfNeed` always leaves `currentOffset` in a valid state 
with actual splits set.



##########
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcSourceOffsetProvider.java:
##########
@@ -535,7 +545,21 @@ private boolean checkNeedSplitChunks(Map<String, String> 
sourceProperties) {

Review Comment:
   **Bug (Critical): `currentOffset` created with null/empty splits causes 
crash on recovery**
   
   When `isSnapshotOnlyMode()` is true and `lastSnapshotSplits.isEmpty()`, this 
code creates a `JdbcOffset` via `new JdbcOffset()` at line 382 but never calls 
`setSplits()`. The `splits` field remains `null` (default from 
`@NoArgsConstructor`).
   
   Later, `hasReachedEnd()` calls `currentOffset.snapshotSplit()` which does:
   ```java
   Preconditions.checkState(splits != null && !splits.isEmpty(), "splits is 
null or empty");
   ```
   This throws `IllegalStateException`. Same crash path via 
`hasMoreDataToConsume()` → `currentOffset.snapshotSplit()`.
   
   **Suggested fix:** Either:
   1. Set a sentinel split list so `snapshotSplit()` works, or
   2. Modify `hasReachedEnd()` to check for null/empty splits directly without 
calling `snapshotSplit()`, or
   3. In this branch, handle the snapshot-only completed state by transitioning 
the job to FINISHED immediately during replay (best approach).



##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingMultiTblTask.java:
##########
@@ -110,6 +110,11 @@ public void before() throws Exception {
         this.status = TaskStatus.RUNNING;
         this.startTimeMs = System.currentTimeMillis();
         this.runningOffset = offsetProvider.getNextOffset(null, 
sourceProperties);
+        if (this.runningOffset == null) {
+            // snapshot-only mode: all splits completed, task exits immediately
+            log.info("streaming multi task {} offset is null (snapshot-only 
completed), skip execution", taskId);
+            return;
+        }
         log.info("streaming multi task {} get running offset: {}", taskId, 
runningOffset.toString());

Review Comment:
   **Bug (High): Early return from `before()` when offset is null leaves the 
job stuck**
   
   When `getNextOffset()` returns null (snapshot-only completed), this task 
returns early from `before()`, then `run()` also returns early, then 
`onSuccess()` returns `false` without calling `onStreamTaskSuccess()`. The 
`StreamingMultiTblTask.onSuccess()` never calls `onStreamTaskSuccess()` — it 
always returns false (line 243).
   
   The job completion signal for multi-table tasks only comes via the external 
`commitOffset()` → `successCallback()` → `onStreamTaskSuccess()` callback path, 
which is never triggered for a no-op task (no data sent to BE, no offset 
committed).
   
   Result: The task finishes silently, the job stays in `RUNNING` with a dead 
task reference, eventually times out to `PAUSED`, auto-resumes to `PENDING`, 
creates another null-offset task, and loops forever.
   
   **Suggested fix:** In `StreamingMultiTblTask.onSuccess()`, when 
`runningOffset == null`, directly call 
`streamingInsertJob.onStreamTaskSuccess(this)` (after acquiring the write lock 
via the appropriate path). Alternatively, handle this in `before()` by checking 
`offsetProvider.hasReachedEnd()` and signaling the job directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to