Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Wouter Zorgdrager
+1, especially if you don't want to rely on external metric reporter this is a nice feature. Op do 2 mei 2019 om 10:29 schreef Fabian Hueske : > Hi, > > Both of you seem to have the same requirement. > This is a good indication that "fault-tolerant metrics" are a missing > feature. > It might mak

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Fabian Hueske
Hi, Both of you seem to have the same requirement. This is a good indication that "fault-tolerant metrics" are a missing feature. It might make sense to think about a built-in mechanism to back metrics with state. Cheers, Fabian Am Do., 2. Mai 2019 um 10:25 Uhr schrieb Paul Lam : > Hi Wouter,

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Paul Lam
Hi Wouter, I've met the same issue and finally managed to use operator states to back the accumulators, so they can be restored after restarts. The downside is that we have to update the values in both accumulators and states to make them consistent. FYI. Best, Paul Lam Fabian Hueske 于2019年5月2日

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Fabian Hueske
Hi Wouter, OK, that explains it :-) Overloaded terms... The Table API / SQL documentation refers to the accumulator of an AggregateFunction [1]. The accumulators that are accessible via the RuntimeContext are a rather old part of the API that is mainly intended for batch jobs. I would not use th

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Wouter Zorgdrager
Hi Fabian, Maybe I should clarify a bit, actually I'm using a (Long)Counter registered as Accumulator in the RuntimeContext [1]. So I'm using a KeyedProcessFunction, not an AggregateFunction. This works property, but is not retained after a job restart. I'm not entirely sure if I did this correct.

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Fabian Hueske
Hi Wouter, The DataStream API accumulators of the AggregateFunction [1] are stored in state and should be recovered in case of a failure as well. If this does not work, it would be a serious bug. What's the type of your accumulator? Can you maybe share the code? How to you apply the AggregateFunc

Preserve accumulators after failure in DataStream API

2019-04-30 Thread Wouter Zorgdrager
Hi all, In the documentation I read about UDF accumulators [1] "Accumulators are automatically backup-ed by Flink’s checkpointing mechanism and restored in case of a failure to ensure exactly-once semantics." So I assumed this also was the case of accumulators used in the DataStream API, but I not