I believe there is some noticeable overhead if you are using the heap-based state backend, but with RocksDB I think the difference is negligible.
David On Tue, Mar 7, 2023 at 11:10 PM Hangxiang Yu <master...@gmail.com> wrote: > > Hi, Tony. > "be detrimental to performance" means that some extra space overhead of the > field of the key-group may influence performance. > As we know, Flink will write the key group as the prefix of the key to speed > up rescaling. > So the format will be like: key group | key len | key | ...... > You could check the relationship between max parallelism and bytes of key > group as below: > ------------------------------------------ > max parallelism bytes of key group > 128 1 > 32768 2 > ------------------------------------------ > So I think the cost will be very small if the real key length >> 2 bytes. > > On Wed, Mar 8, 2023 at 1:06 PM Tony Wei <tony19920...@gmail.com> wrote: >> >> Hi experts, >> >>> Setting the maximum parallelism to a very large value can be detrimental to >>> performance because some state backends have to keep internal data >>> structures that scale with the number of key-groups (which are the internal >>> implementation mechanism for rescalable state). >>> >>> Changing the maximum parallelism explicitly when recovery from original job >>> will lead to state incompatibility. >> >> >> I read the section above from Flink official document [1], and I'm wondering >> what the detail is regarding to the side-effect. >> >> Suppose that I have a Flink SQL job with large state, large parallelism and >> using RocksDB as my state backend. >> I would like to set the max parallelism as 32768, so that I don't bother if >> the max parallelism can be divided by the parallelism whenever I want to >> scale my job, >> because the number of key groups will not differ too much between each >> subtask. >> >> I'm wondering if this is a good practice, because based on the official >> document it is not recommended actually. >> If possible, I would like to know the detail about this side-effect. Which >> state backend will have this issue? and Why? >> Please give me an advice. Thanks in advance. >> >> Best regards, >> Tony Wei >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism > > > > -- > Best, > Hangxiang.