[jira] [Updated] (FLINK-35764) TimerGauge is incorrect when update is called during a measurement

Liu Liu (Jira) Wed, 18 Sep 2024 00:51:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-35764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Liu Liu updated FLINK-35764:
----------------------------
    Description: 
*Description*

Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state 
(marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as 
follows:
 - {{markStart()}} bumps {{currentMeasurementStartTS}} and 
{{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
 - {{currentCount}} stores the time in the state within this update interval
 - When calling {{{}markEnd(){}}}, the time since {{currentMeasurementStartTS}} 
is added to the {{currentCount}}
 - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added 
to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the 
gauge value

The intent is that a state can span across two update intervals (by calling 
{{{}markStart() -> update() -> markEnd(){}}}). However, the results will be 
incorrect, since the time between {{markStart()}} and {{update()}} will be 
counted in the first update interval by {{{}update(){}}}, and then in the 
second update interval by {{{}markEnd(){}}}, so the measurement will be larger 
than the correct value. The correct solution is to only add the time since 
{{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.

*Test case*

 

The current test cases in {{TimerGaugeTest}} do not catch this bug, because the 
assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and the code 
does not simulate the time passed outside the state ({{{}*(1){}}} in the code 
above).

  was:
*Description*

Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state 
(marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as 
follows:
 - {{markStart()}} bumps {{currentMeasurementStartTS}} and 
{{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
 - {{currentCount}} stores the time in the state within this update interval
 - When calling {{{}markEnd(){}}}, the time since {{currentMeasurementStartTS}} 
is added to the {{currentCount}}
 - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added 
to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the 
gauge value

The intent is that a state can span across two update intervals (by calling 
{{{}markStart() -> update() -> markEnd(){}}}). However, the results will be 
incorrect, since the time between {{markStart()}} and {{update()}} will be 
counted in the first update interval by {{{}update(){}}}, and then in the 
second update interval by {{{}markEnd(){}}}, so the measurement will be larger 
than the correct value. The correct solution is to only add the time since 
{{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.

*Test case*
{quote}{{@Test  }}
{{void testUpdateBeforeMarkingEnd() {  }}
{{    ManualClock clock = new ManualClock(42_000_000);}}
{{    // this timer gauge measures 2 update intervals}}
{{    TimerGauge gauge = new TimerGauge(clock, 2 * 
View.UPDATE_INTERVAL_SECONDS);}}
{{    long UPDATE_INTERVAL_MILLIS = 
TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS);  // 5000 ms}}
{{    long SLEEP = 10;  // 10 ms}}

{{    // interval 1}}
{{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, TimeUnit.MILLISECONDS); 
 // *(1)}}
{{    gauge.markStart();  }}
{{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
{{    gauge.update();}}

 
{{{{    // interval 2}}}}
{{{{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}}}
{{{{    gauge.markEnd();}}}}
{{{{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, 
TimeUnit.MILLISECONDS);  // *(1)}}}}
{{    gauge.update();}}
{{{{ }}}}
{{{{    // expected: 2, actual: 3}}}}
{{{{    assertThat(gauge.getValue()).isEqualTo(SLEEP / 
View.UPDATE_INTERVAL_SECONDS);  }}}}
{{{{}}}}}{quote}
The current test cases in {{TimerGaugeTest}} do not catch this bug, because the 
assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and the code 
does not simulate the time passed outside the state ({{{}*(1){}}} in the code 
above).


> TimerGauge is incorrect when update is called during a measurement
> ------------------------------------------------------------------
>
>                 Key: FLINK-35764
>                 URL: https://issues.apache.org/jira/browse/FLINK-35764
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>    Affects Versions: 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0
>            Reporter: Liu Liu
>            Priority: Major
>
> *Description*
> Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state 
> (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as 
> follows:
>  - {{markStart()}} bumps {{currentMeasurementStartTS}} and 
> {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
>  - {{currentCount}} stores the time in the state within this update interval
>  - When calling {{{}markEnd(){}}}, the time since 
> {{currentMeasurementStartTS}} is added to the {{currentCount}}
>  - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added 
> to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the 
> gauge value
> The intent is that a state can span across two update intervals (by calling 
> {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be 
> incorrect, since the time between {{markStart()}} and {{update()}} will be 
> counted in the first update interval by {{{}update(){}}}, and then in the 
> second update interval by {{{}markEnd(){}}}, so the measurement will be 
> larger than the correct value. The correct solution is to only add the time 
> since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.
> *Test case*
>  
> The current test cases in {{TimerGaugeTest}} do not catch this bug, because 
> the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and 
> the code does not simulate the time passed outside the state ({{{}*(1){}}} in 
> the code above).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35764) TimerGauge is incorrect when update is called during a measurement

Reply via email to