[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148331#comment-17148331 ]
Aihua Li commented on FLINK-18433: ---------------------------------- I found the reason: There were three machines in the performance comparison environment, one master machine and two workers configured in the conf/slaves file. But in version 1.11, the conf/slaves file has been changed to conf/workers and the default value is “localhost”. So in the fact that there was one JM and two Tms in the 1.10 test, and test job’s container was scheduled to other machines. But there is only one jm and one tm in 1.11 and they were on the same machine I confirmed this change with @Xintong Song, and ran on a machine to re-verify it,the result is: |release-1.10|release-1.11| | |18.17666667|17.75833333|-2.30%| |168.875|163.7766667|-3.02%| |630.715|644.9383333|2.26%| |37.60833333|36.61166667|-2.65%| |81.40166667|79.07|-2.86%| |1426.503333|1398.851667|-1.94%| |18.825|18.07833333|-3.97%| |168.9033333|160.275|-5.11%| |630.1816667|654.7516667|3.90%| |38.30333333|36.15666667|-5.60%| |80.87|80.17666667|-0.86%| |1300.063333|1394.201667|7.24%| |18.60333333|18.055|-2.95%| |168.5633333|190.0083333|12.72%| |625.8433333|651.0033333|4.02%| |37.96166667|35.85666667|-5.55%| |140.3716667|138.8116667|-1.11%| |1231.29|1230.145|-0.09%| |32.46666667|31.68833333|-2.40%| |141.22|131.4783333|-6.90%| |1007.375|1092.865|8.49%| |33.25333333|31.01666667|-6.73%| |141.065|137.26|-2.70%| |1233.316667|1222.988333|-0.84%| |32.11666667|31.68166667|-1.35%| |144.9083333|136.645|-5.70%| |1005.598333|1090.656667|8.46%| |32.84|31.24833333|-4.85%| |141.53|137.675|-2.72%| |1260.055|1183.28|-6.09%| |31.97666667|31.55666667|-1.31%| |143.79|132.5716667|-7.80%| >From this result , some scenarios 1.11 were better than 1.10,but some >scenarios were opposite, and the changed little.I think it is normal. What do >you think? > From the end-to-end performance test results, 1.11 has a regression > ------------------------------------------------------------------- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream > Affects Versions: 1.11.0 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > Reporter: Aihua Li > Priority: Major > Attachments: flink_11.log.gz > > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.81333333|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.323333|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.6883333|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.123333|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.91666667|43.09833333|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.7266667|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.55166667|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.0383333|200.3883333|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.05833333|43.49666667|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.2333333|201.1883333|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.663333|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.62333333|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.9183333|152.9566667|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.29666667|34.16666667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.3533333|151.8483333|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.57166667|32.09666667|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.5883333|145.9966667|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.68333333|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.3033333|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb > -- This message was sent by Atlassian Jira (v8.3.4#803005)