[ https://issues.apache.org/jira/browse/FLINK-35764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liu Liu updated FLINK-35764: ---------------------------- Description: *Description* Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as follows: - {{markStart()}} bumps {{currentMeasurementStartTS}} and {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}} - {{currentCount}} stores the time in the state within this update interval - When calling {{{}markEnd(){}}}, the time since {{currentMeasurementStartTS}} is added to the {{currentCount}} - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the gauge value The intent is that a state can span across two update intervals (by calling {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be incorrect, since the time between {{markStart()}} and {{update()}} will be counted in the first update interval by {{{}update(){}}}, and then in the second update interval by {{{}markEnd(){}}}, so the measurement will be larger than the correct value. The correct solution is to only add the time since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}. *Test case* The current test cases in {{TimerGaugeTest}} do not catch this bug, because the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and the code does not simulate the time passed outside the state ({{{}*(1){}}} in the code above). was: *Description* Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as follows: - {{markStart()}} bumps {{currentMeasurementStartTS}} and {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}} - {{currentCount}} stores the time in the state within this update interval - When calling {{{}markEnd(){}}}, the time since {{currentMeasurementStartTS}} is added to the {{currentCount}} - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the gauge value The intent is that a state can span across two update intervals (by calling {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be incorrect, since the time between {{markStart()}} and {{update()}} will be counted in the first update interval by {{{}update(){}}}, and then in the second update interval by {{{}markEnd(){}}}, so the measurement will be larger than the correct value. The correct solution is to only add the time since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}. *Test case* {quote}{{@Test }} {{void testUpdateBeforeMarkingEnd() { }} {{ ManualClock clock = new ManualClock(42_000_000);}} {{ // this timer gauge measures 2 update intervals}} {{ TimerGauge gauge = new TimerGauge(clock, 2 * View.UPDATE_INTERVAL_SECONDS);}} {{ long UPDATE_INTERVAL_MILLIS = TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS); // 5000 ms}} {{ long SLEEP = 10; // 10 ms}} {{ // interval 1}} {{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, TimeUnit.MILLISECONDS); // *(1)}} {{ gauge.markStart(); }} {{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }} {{ gauge.update();}} {{{{ // interval 2}}}} {{{{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }}}} {{{{ gauge.markEnd();}}}} {{{{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, TimeUnit.MILLISECONDS); // *(1)}}}} {{ gauge.update();}} {{{{ }}}} {{{{ // expected: 2, actual: 3}}}} {{{{ assertThat(gauge.getValue()).isEqualTo(SLEEP / View.UPDATE_INTERVAL_SECONDS); }}}} {{{{}}}}}{quote} The current test cases in {{TimerGaugeTest}} do not catch this bug, because the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and the code does not simulate the time passed outside the state ({{{}*(1){}}} in the code above). > TimerGauge is incorrect when update is called during a measurement > ------------------------------------------------------------------ > > Key: FLINK-35764 > URL: https://issues.apache.org/jira/browse/FLINK-35764 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics > Affects Versions: 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0 > Reporter: Liu Liu > Priority: Major > > *Description* > Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state > (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as > follows: > - {{markStart()}} bumps {{currentMeasurementStartTS}} and > {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}} > - {{currentCount}} stores the time in the state within this update interval > - When calling {{{}markEnd(){}}}, the time since > {{currentMeasurementStartTS}} is added to the {{currentCount}} > - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added > to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the > gauge value > The intent is that a state can span across two update intervals (by calling > {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be > incorrect, since the time between {{markStart()}} and {{update()}} will be > counted in the first update interval by {{{}update(){}}}, and then in the > second update interval by {{{}markEnd(){}}}, so the measurement will be > larger than the correct value. The correct solution is to only add the time > since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}. > *Test case* > > The current test cases in {{TimerGaugeTest}} do not catch this bug, because > the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and > the code does not simulate the time passed outside the state ({{{}*(1){}}} in > the code above). -- This message was sent by Atlassian Jira (v8.20.10#820010)