DongyuanPan commented on PR #27334:
URL: https://github.com/apache/flink/pull/27334#issuecomment-4108905441

   > Thanks for the suggestion @DongyuanPan!
   > 
   > I considered that approach but it doesn't quite work here. The main issue 
is that `StreamingJobGraphGenerator` only has access to the job-side 
configuration (via `StreamGraph.getJobConfiguration()`), but parallelism 
overrides can also come from the cluster-side config 
(`jobMasterConfiguration`). The override util merges both sources with the 
right precedence, and the generator simply doesn't have visibility into the 
cluster config — nor should it, since it's meant to be a pure structural 
converter.
   > 
   > Also, for `AdaptiveBatchScheduler`, the JobGraph can be built 
incrementally (vertex by vertex as upstream tasks finish), so applying 
overrides during generation would get messy.
   > 
   > The scheduler factories are really the first place where both the 
completed JobGraph and the cluster configuration exist together, which is why I 
went with applying overrides there.
   
   Perhaps I didn't explain myself clearly just now. What I meant was that the 
correct parallelism could be set directly within `internalSubmitJob` and stored 
in the `StreamGraph`, since `internalSubmitJob` has access to both the cluster 
configuration and the job configuration. This would allow `getJobGraph` to set 
the vertex parallelism directly, without needing to be aware of the cluster 
configuration. However, under this approach, the `AdaptiveBatchScheduler` would 
indeed need to give additional consideration to how it handles these settings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to