Hi, Tony. "be detrimental to performance" means that some extra space overhead of the field of the key-group may influence performance. As we know, Flink will write the key group as the prefix of the key to speed up rescaling. So the format will be like: key group | key len | key | ...... You could check the relationship between max parallelism and bytes of key group as below: ------------------------------------------ max parallelism bytes of key group 128 1 32768 2 ------------------------------------------ So I think the cost will be very small if the real key length >> 2 bytes.
On Wed, Mar 8, 2023 at 1:06 PM Tony Wei <tony19920...@gmail.com> wrote: > Hi experts, > > Setting the maximum parallelism to a very large value can be detrimental >> to performance because some state backends have to keep internal data >> structures that scale with the number of key-groups (which are the internal >> implementation mechanism for rescalable state). >> >> Changing the maximum parallelism explicitly when recovery from original >> job will lead to state incompatibility. >> > > I read the section above from Flink official document [1], and I'm > wondering what the detail is regarding to the side-effect. > > Suppose that I have a Flink SQL job with large state, large parallelism > and using RocksDB as my state backend. > I would like to set the max parallelism as 32768, so that I don't bother > if the max parallelism can be divided by the parallelism whenever I want to > scale my job, > because the number of key groups will not differ too much between each > subtask. > > I'm wondering if this is a good practice, because based on the > official document it is not recommended actually. > If possible, I would like to know the detail about this side-effect. Which > state backend will have this issue? and Why? > Please give me an advice. Thanks in advance. > > Best regards, > Tony Wei > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism > -- Best, Hangxiang.