Prometheus provides avg_over_time for a range vector. That seems to be better suited for this usecase.
On Tue, Jan 12, 2021 at 6:53 PM Chesnay Schepler <ches...@apache.org> wrote: > The cumulative time probably isn't that useful to detect changes in the > behavior of the application. > > On 1/12/2021 12:30 PM, Chesnay Schepler wrote: > > I mean the difference itself, not cumulative. > > On 1/12/2021 12:08 PM, Manish G wrote: > > Can you elaborate the second approach more? > Currently I am exposing the difference itself. OR do you mean the > cumulative difference?ie I maintain a member variable, say timeSoFar, and > update it with time consumed by each method call and then expose it. > Something like this: > > timeSoFar += timeConsumedByCurrentInvocation > this.simpleGaug.setValue( timeSoFar ); > > On Tue, Jan 12, 2021 at 4:24 PM Chesnay Schepler <ches...@apache.org> > wrote: > >> That approach will generally not work for jobs that run for a long time, >> because it will be nigh impossible for anomalies to affect the average. You >> want to look into exponential moving averages. >> Alternatively, just expose the diff as an absolute value and calculate >> the average in prometheus. >> >> On 1/12/2021 11:50 AM, Manish G wrote: >> >> OK, got it. >> So I would need to accumulate the time value over the calls as well as >> number of times it is called...and then calculate average(accumulated time/ >> number of times called) and then set calculated value into gauge as above. >> >> On Tue, Jan 12, 2021 at 4:12 PM Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> A gauge just returns a value, and Flink exposes it as is. As such you >>> need to calculate the average over time yourself, taking 2 time >>> measurements (before and after the processing of each). >>> >>> On 1/12/2021 11:31 AM, Manish G wrote: >>> >>> startTime is set at start of function: >>> >>> long startTime = System.currentTimeMillis(); >>> >>> >>> On Tue, Jan 12, 2021 at 3:59 PM Manish G <manish.c.ghildi...@gmail.com> >>> wrote: >>> >>>> My code is: >>>> >>>> public class SimpleGauge<T> implements Gauge<T> { >>>> >>>> private T mValue; >>>> >>>> @Override >>>> public T getValue() { >>>> return mValue; >>>> } >>>> >>>> public void setValue(T value){ >>>> mValue = value; >>>> } >>>> } >>>> >>>> And in flatmap function: >>>> >>>> float endTime = (System.currentTimeMillis() - startTime) / 1000F; >>>> this.simplegauge.setValue(endTime); >>>> >>>> >>>> So does it mean when flink calls my getValue function to accumulate the >>>> value, and not to take it as snapshot? >>>> >>>> >>>> On Tue, Jan 12, 2021 at 3:53 PM Chesnay Schepler <ches...@apache.org> >>>> wrote: >>>> >>>>> Sure, that might work. Be aware though that time measurements are, >>>>> compared to the logic within a function, usually rather expensive and >>>>> may impact performance. >>>>> >>>>> On 1/12/2021 10:57 AM, Manish G wrote: >>>>> > Hi All, >>>>> > >>>>> > I have implemented a flatmap function and I want to collect metrics >>>>> > for average time for this function which I plan to monitor via >>>>> prometheus. >>>>> > >>>>> > What would be good approach for it? I have added a gauge to the >>>>> > method(extending Gauge interface from flink API). Would it work for >>>>> my >>>>> > needs? >>>>> > >>>>> > >>>>> >>>>> >>> >> > >