DongyuanPan commented on PR #27334: URL: https://github.com/apache/flink/pull/27334#issuecomment-4108905441
> Thanks for the suggestion @DongyuanPan! > > I considered that approach but it doesn't quite work here. The main issue is that `StreamingJobGraphGenerator` only has access to the job-side configuration (via `StreamGraph.getJobConfiguration()`), but parallelism overrides can also come from the cluster-side config (`jobMasterConfiguration`). The override util merges both sources with the right precedence, and the generator simply doesn't have visibility into the cluster config — nor should it, since it's meant to be a pure structural converter. > > Also, for `AdaptiveBatchScheduler`, the JobGraph can be built incrementally (vertex by vertex as upstream tasks finish), so applying overrides during generation would get messy. > > The scheduler factories are really the first place where both the completed JobGraph and the cluster configuration exist together, which is why I went with applying overrides there. Perhaps I didn't explain myself clearly just now. What I meant was that the correct parallelism could be set directly within `internalSubmitJob` and stored in the `StreamGraph`, since `internalSubmitJob` has access to both the cluster configuration and the job configuration. This would allow `getJobGraph` to set the vertex parallelism directly, without needing to be aware of the cluster configuration. However, under this approach, the `AdaptiveBatchScheduler` would indeed need to give additional consideration to how it handles these settings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
