[ 
https://issues.apache.org/jira/browse/FLINK-35764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883025#comment-17883025
 ] 

Liu Liu commented on FLINK-35764:
---------------------------------

[~pnowojski] Hello, I see you edited the TimerGauge class since I opened this 
issue, but this problem still exists. I have made the patch and opened the pull 
request. Could you help me get this reviewed, please?

> TimerGauge is incorrect when update is called during a measurement
> ------------------------------------------------------------------
>
>                 Key: FLINK-35764
>                 URL: https://issues.apache.org/jira/browse/FLINK-35764
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>    Affects Versions: 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0, 1.20.0
>            Reporter: Liu Liu
>            Priority: Major
>              Labels: pull-request-available
>
> *Description*
> Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state 
> (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as 
> follows:
>  - {{markStart()}} bumps {{currentMeasurementStartTS}} and 
> {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
>  - {{currentCount}} stores the time in the state within this update interval
>  - When calling {{{}markEnd(){}}}, the time since 
> {{currentMeasurementStartTS}} is added to the {{currentCount}}
>  - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added 
> to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the 
> gauge value
> The intent is that a state can span across two update intervals (by calling 
> {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be 
> incorrect, since the time between {{markStart()}} and {{update()}} will be 
> counted in the first update interval by {{{}update(){}}}, and then in the 
> second update interval by {{{}markEnd(){}}}, so the measurement will be 
> larger than the correct value. The correct solution is to only add the time 
> since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.
> *Test case*
> {{@Test  }}
> {{void testUpdateBeforeMarkingEnd() {  }}
> {{    ManualClock clock = new ManualClock(42_000_000);}}
> {{    // this timer gauge measures 2 update intervals}}
> {{    TimerGauge gauge = new TimerGauge(clock, 2 * 
> View.UPDATE_INTERVAL_SECONDS);}}
> {{    long UPDATE_INTERVAL_MILLIS = 
> TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS);  // 5000 ms}}
> {{    long SLEEP = 10;  // 10 ms}}
> {{    // interval 1}}
> {{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, 
> TimeUnit.MILLISECONDS);  // *(1)}}
> {{    gauge.markStart();  }}
> {{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
> {{    gauge.update();}}
> {{    // interval 2}}
> {{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
> {{    gauge.markEnd();}}
> {{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, 
> TimeUnit.MILLISECONDS);  // *(1)}}
> {{    gauge.update();  }}
> {{    // expected: 2, actual: 3}}
> {{    assertThat(gauge.getValue()).isEqualTo(SLEEP / 
> View.UPDATE_INTERVAL_SECONDS);  }}
> {{}}}
> The current test cases in {{TimerGaugeTest}} do not catch this bug, because 
> the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and 
> the code does not simulate the time passed outside the state ({{{}*(1){}}} in 
> the code above).
> *Proposed changes*
>  * In {{{}TimerGauge{}}}, only add the time since {{currentUpdateTS}} to 
> {{currentCount}} in {{markEnd()}}
>  * Add the test case above to {{{}TimerGaugeTest{}}}, and adjust other test 
> cases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to