Hi experts, Setting the maximum parallelism to a very large value can be detrimental to > performance because some state backends have to keep internal data > structures that scale with the number of key-groups (which are the internal > implementation mechanism for rescalable state). > > Changing the maximum parallelism explicitly when recovery from original > job will lead to state incompatibility. >
I read the section above from Flink official document [1], and I'm wondering what the detail is regarding to the side-effect. Suppose that I have a Flink SQL job with large state, large parallelism and using RocksDB as my state backend. I would like to set the max parallelism as 32768, so that I don't bother if the max parallelism can be divided by the parallelism whenever I want to scale my job, because the number of key groups will not differ too much between each subtask. I'm wondering if this is a good practice, because based on the official document it is not recommended actually. If possible, I would like to know the detail about this side-effect. Which state backend will have this issue? and Why? Please give me an advice. Thanks in advance. Best regards, Tony Wei [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism