[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145692#comment-17145692 ]
Stephan Ewen commented on FLINK-18433: -------------------------------------- I would not dismiss the runtime code so far. The fact that it is both in MemStateBackend and RocksDBStateBackend would be quite well explained with more frequent stalls, for example during buffer allocation. If there are fewer buffers by default, or a different buffer allocation/acquisition code path now, I'd consider that a strong suspect. > From the end-to-end performance test results, 1.11 has a regression > ------------------------------------------------------------------- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream > Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > Reporter: Aihua Li > Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.81333333|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.323333|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.6883333|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.123333|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.91666667|43.09833333|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.7266667|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.55166667|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.0383333|200.3883333|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.05833333|43.49666667|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.2333333|201.1883333|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.663333|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.62333333|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.9183333|152.9566667|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.29666667|34.16666667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.3533333|151.8483333|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.57166667|32.09666667|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.5883333|145.9966667|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.68333333|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.3033333|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb > -- This message was sent by Atlassian Jira (v8.3.4#803005)