[ 
https://issues.apache.org/jira/browse/FLINK-26864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517468#comment-17517468
 ] 

Sebastian Mattheis edited comment on FLINK-26864 at 4/5/22 2:18 PM:
--------------------------------------------------------------------

[~ym] , I talked to [~pnowojski]  if we should do a quick-fix like a revert 
while I'm working on it but we agreed that this is not too urgent for now as it 
is not included in 1.16. The fix is, as said, WIP and I will finish it this 
week where I'm expecting to get back to performance as before.
The performance regression is similar to what is observed in FLINK-23560:
 * Root cause: In the specific benchmarks, a lock is applied because there are 
no mail actions generated/executed. This lock elision cannot be applied anymore 
and is normal if, e.g., checkpointing is executed but also with the changes 
that perform latency measurements for mailbox processing which both 
generates/executes mail actions. If lock elision cannot be applied anymore, 
performance drops for these specific benchmarks as observed/described in this 
issue.

The implications are:
 # There is no performance regression if the application performs checkpointing 
anyways, i.e., in most streaming applications.
 # For batch processing applications, there might be the observed performance 
regression. To avoid the regression, the fix is to start latency measurements 
only if there are mails genergated/executed. This fix is WIP.


was (Author: JIRAUSER284806):
[~ym] , I talked to [~pnowojski]  if we should do a quick-fix like a revert 
while I'm working on it but we agreed that this is too urgent for now as it is 
not included in 1.16. The fix is, as said, WIP and I will finish it this week 
where I'm expecting to get back to performance as before.
The performance regression is similar to what is observed in FLINK-23560:
 * Root cause: In the specific benchmarks, a lock is applied because there are 
no mail actions generated/executed. This lock elision cannot be applied anymore 
and is normal if, e.g., checkpointing is executed but also with the changes 
that perform latency measurements for mailbox processing which both 
generates/executes mail actions. If lock elision cannot be applied anymore, 
performance drops for these specific benchmarks as observed/described in this 
issue.

The implications are:
 # There is no performance regression if the application performs checkpointing 
anyways, i.e., in most streaming applications.
 # For batch processing applications, there might be the observed performance 
regression. To avoid the regression, the fix is to start latency measurements 
only if there are mails genergated/executed. This fix is WIP.

> Performance regression on 25.03.2022
> ------------------------------------
>
>                 Key: FLINK-26864
>                 URL: https://issues.apache.org/jira/browse/FLINK-26864
>             Project: Flink
>          Issue Type: Bug
>          Components: Benchmarks
>    Affects Versions: 1.16.0
>            Reporter: Piotr Nowojski
>            Assignee: Sebastian Mattheis
>            Priority: Blocker
>
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=arrayKeyBy&extr=on&quarts=on&equid=off&env=2&revs=200
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=remoteFilePartition&extr=on&quarts=on&equid=off&env=2&revs=200
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=remoteSortPartition&extr=on&quarts=on&equid=off&env=2&revs=200
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tupleKeyBy&extr=on&quarts=on&equid=off&env=2&revs=200



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to