Re: Watermark Question on Failed Process

2018-04-04 Thread Chengzhi Zhao
Thanks Fabian! This is very helpful! Best, Chengzhi On Wed, Apr 4, 2018 at 9:02 AM, Fabian Hueske wrote: > Hi Chengzhi, > > You can access the current watermark from the Context object of a > ProcessFunction [1] and store it in operator state [2]. > In case of a restart, the state will be re

Re: Watermark Question on Failed Process

2018-04-04 Thread Fabian Hueske
Hi Chengzhi, You can access the current watermark from the Context object of a ProcessFunction [1] and store it in operator state [2]. In case of a restart, the state will be restored with the watermark that was active when the checkpoint (or savepoint) was taken. Note, this won't be the last wate

Re: Watermark Question on Failed Process

2018-04-03 Thread Chengzhi Zhao
Thanks Timo for your response and the references. I will try BoundedOutOfOrdernessTimestampExtractor, and if it does't work as expected, I will handle it as a separated pipeline. Also, is there a way to retrieve the last watermark before/after failure? So maybe I can persist the watermark to exter

Re: Watermark Question on Failed Process

2018-04-03 Thread Timo Walther
Hi Chengzhi, if you emit a watermark even though there is still data with a lower timestamp, you generate "late data" that either needs to be processed in a separate branch of your pipeline (see sideOutputLateData() [1]) or should force your existing operators to update their previously emitte

Watermark Question on Failed Process

2018-04-02 Thread Chengzhi Zhao
Hello, flink community, I am using period watermark and extract the event time from each records from files in S3. I am using the `TimeLagWatermarkGenerator` as it was mentioned in flink documentation. Currently, a new watermark will be generated using processing time by fixed amount override de