Hi Flink community,
I’m encountering an issue with PyFlink where a FlatMap operator invokes an external service (using a PyTorch model to generate embedding vectors). The operator processes data very slowly, leading to an extremely long initial checkpoint start delay, which eventually causes checkpoint failures. The external service has strict concurrency limits and cannot handle increased parallel requests,increasing the parallelism of the operator did not improve performance due to this bottleneck. Besides, when I use flink1.20.0, the operator processing speed seems to be faster than that of flink2.0.0. Does anyone have any clue? Thank you for your insights!