Hi Yingjie, +1 for this FLIP. I'm pretty sure this will greatly improve the ease of batch jobs.
Looks like "taskmanager.memory.framework.off-heap.batch-shuffle.size" and "taskmanager.network.sort-shuffle.min-buffers" are related to network memory and framework.off-heap.size. My question is, what is the maximum parallelism a job can have with the default configuration? (Does this break out of the box) How much network memory and framework.off-heap.size are required for how much parallelism in the default configuration? I do feel that this correspondence is a bit difficult to control at the moment, and it would be best if a rough table could be provided. Best, Jingsong On Tue, Dec 14, 2021 at 2:16 PM Yingjie Cao <kevin.ying...@gmail.com> wrote: > > Hi Jiangang, > > Thanks for your suggestion. > > >>> The config can affect the memory usage. Will the related memory configs > >>> be changed? > > I think we will not change the default network memory settings. My best > expectation is that the default value can work for most cases (though may not > the best) and for other cases, user may need to tune the memory settings. > > >>> Can you share the tpcds results for different configs? Although we change > >>> the default values, it is helpful to change them for different users. In > >>> this case, the experience can help a lot. > > I did not keep all previous TPCDS results, but from the results, I can tell > that on HDD, always using the sort-shuffle is a good choice. For small jobs, > using sort-shuffle may not bring much performance gain, this may because that > all shuffle data can be cached in memory (page cache), this is the case if > the cluster have enough resources. However, if the whole cluster is under > heavy burden or you are running large scale jobs, the performance of those > small jobs can also be influenced. For large-scale jobs, the configurations > suggested to be tuned are taskmanager.network.sort-shuffle.min-buffers and > taskmanager.memory.framework.off-heap.batch-shuffle.size, you can increase > these values for large-scale batch jobs. > > BTW, I am still running TPCDS tests these days and I can share these results > soon. > > Best, > Yingjie > > 刘建刚 <liujiangangp...@gmail.com> 于2021年12月10日周五 18:30写道: >> >> Glad to see the suggestion. In our test, we found that small jobs with the >> changing configs can not improve the performance much just as your test. I >> have some suggestions: >> >> The config can affect the memory usage. Will the related memory configs be >> changed? >> Can you share the tpcds results for different configs? Although we change >> the default values, it is helpful to change them for different users. In >> this case, the experience can help a lot. >> >> Best, >> Liu Jiangang >> >> Yun Gao <yungao...@aliyun.com.invalid> 于2021年12月10日周五 17:20写道: >>> >>> Hi Yingjie, >>> >>> Very thanks for drafting the FLIP and initiating the discussion! >>> >>> May I have a double confirmation for >>> taskmanager.network.sort-shuffle.min-parallelism that >>> since other frameworks like Spark have used sort-based shuffle for all the >>> cases, does our >>> current circumstance still have difference with them? >>> >>> Best, >>> Yun >>> >>> >>> >>> >>> ------------------------------------------------------------------ >>> From:Yingjie Cao <kevin.ying...@gmail.com> >>> Send Time:2021 Dec. 10 (Fri.) 16:17 >>> To:dev <dev@flink.apache.org>; user <u...@flink.apache.org>; user-zh >>> <user...@flink.apache.org> >>> Subject:Re: [DISCUSS] Change some default config values of blocking shuffle >>> >>> Hi dev & users: >>> >>> I have created a FLIP [1] for it, feedbacks are highly appreciated. >>> >>> Best, >>> Yingjie >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability >>> Yingjie Cao <kevin.ying...@gmail.com> 于2021年12月3日周五 17:02写道: >>> >>> Hi dev & users, >>> >>> We propose to change some default values of blocking shuffle to improve the >>> user out-of-box experience (not influence streaming). The default values we >>> want to change are as follows: >>> >>> 1. Data compression >>> (taskmanager.network.blocking-shuffle.compression.enabled): Currently, the >>> default value is 'false'. Usually, data compression can reduce both disk >>> and network IO which is good for performance. At the same time, it can save >>> storage space. We propose to change the default value to true. >>> >>> 2. Default shuffle implementation >>> (taskmanager.network.sort-shuffle.min-parallelism): Currently, the default >>> value is 'Integer.MAX', which means by default, Flink jobs will always use >>> hash-shuffle. In fact, for high parallelism, sort-shuffle is better for >>> both stability and performance. So we propose to reduce the default value >>> to a proper smaller one, for example, 128. (We tested 128, 256, 512 and >>> 1024 with a tpc-ds and 128 is the best one.) >>> >>> 3. Read buffer of sort-shuffle >>> (taskmanager.memory.framework.off-heap.batch-shuffle.size): Currently, the >>> default value is '32M'. Previously, when choosing the default value, both >>> ‘32M' and '64M' are OK for tests and we chose the smaller one in a cautious >>> way. However, recently, it is reported in the mailing list that the default >>> value is not enough which caused a buffer request timeout issue. We already >>> created a ticket to improve the behavior. At the same time, we propose to >>> increase this default value to '64M' which can also help. >>> >>> 4. Sort buffer size of sort-shuffle >>> (taskmanager.network.sort-shuffle.min-buffers): Currently, the default >>> value is '64' which means '64' network buffers (32k per buffer by default). >>> This default value is quite modest and the performance can be influenced. >>> We propose to increase this value to a larger one, for example, 512 (the >>> default TM and network buffer configuration can serve more than 10 result >>> partitions concurrently). >>> >>> We already tested these default values together with tpc-ds benchmark in a >>> cluster and both the performance and stability improved a lot. These >>> changes can help to improve the out-of-box experience of blocking shuffle. >>> What do you think about these changes? Is there any concern? If there are >>> no objections, I will make these changes soon. >>> >>> Best, >>> Yingjie >>> -- Best, Jingsong Lee