liunaijie opened a new issue, #8861:
URL: https://github.com/apache/seatunnel/issues/8861

   ### Search before asking
   
   - [x] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   I noticed that the current generated taskLocation information format is 
inconsistent, and the task relationships can't be inferred from the ID.
   
   For example:
   I have a job, logical plan is like this (will attach the job config below): 
   
   ```
   Job {
      FakeSource -> SqlTransForm -> ConsoleSink ( parallelism = 2)
      FakeSource -> ConsoleSink ( parallelism = 4)
   }
   ```
   
   For Subplan 1, it will generate those physical plans.
   
   
![Image](https://github.com/user-attachments/assets/c66d0405-a2c1-4714-8f75-ed3f3d4a0527)
   
   
   Now the generated `TaskLocation` informations are:
   
   ```
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947338275662594049, 
pipelineId=1, taskGroupId=1}, taskID=20000, index=0}, 
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947338275662594049, 
pipelineId=1, taskGroupId=30000}, taskID=40000, index=0}, 
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947338275662594049, 
pipelineId=1, taskGroupId=30000}, taskID=50000, index=0},
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947338275662594049, 
pipelineId=1, taskGroupId=30001}, taskID=40001, index=1},     
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947338275662594049, 
pipelineId=1, taskGroupId=30001}, taskID=50001, index=1]
   ```
   
   From the results, we can observe the following:
   
   - Some taskGroupIds are `1`, while others are `3000` and `30001`. The 
formats are inconsistent.
   - Within the same task group, the taskID format matches the taskGroupId.
   
   
   
   I suggest making the following updates:
   
   - Standardize the taskGroupId format to start from `1` and increment 
sequentially.
   - Add more information to the taskID.
   
   Generate result as:
   ```
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947407126777561089, 
pipelineId=1, taskGroupId=1}, taskID=1000100010001, index=0}, 
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947407126777561089, 
pipelineId=1, taskGroupId=2}, taskID=1000200010001, index=0}, 
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947407126777561089, 
pipelineId=1, taskGroupId=2}, taskID=1000200020001, index=0},
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947407126777561089, 
pipelineId=1, taskGroupId=3}, taskID=1000300010002, index=1}, 
       
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=947407126777561089, 
pipelineId=1, taskGroupId=3}, taskID=1000300020002, index=1} 
   ```
   
   
   the `taskID` generated rule is 
   ```
   sub_plan_id * 10000L * 10000L * 10000L +
   task_group_id * 10000L * 10000L  +
   task_index_in_group * 10000L +
   task_parallelism_index + 1
   ```
   
   the long max value is `922 3372 0368 5477 5807L`. So it won't be overflow.
   
   
   
   
   
   
   
   
   
   Job config:
   
   ```
   env {
     parallelism = 2
     job.mode = "BATCH"
   }
   
   source {
     # This is a example source plugin **only for test and demonstrate the 
feature source plugin**
     FakeSource {
       result_table_name = "fake"
       schema = {
         fields {
           name = "string"
           age = "int"
         }
       }
     }
   
     FakeSource {
       parallelism = 4
       result_table_name = "fake2"
       schema = {
         fields {
           name = "string"
           age = "int"
           address = string
         }
       }
     }
   
   }
   
   transform {
     Sql {
       source_table_name="fake"
       result_table_name="sql"
       query = "select * from fake"
     }
   }
   
   sink {
     console {
       source_table_name="sql"
     }
     Console {
       source_table_name="fake2"
     }
   }
   ```
   
   
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to