Re: Problem with Flink restoring from checkpoints

2017-08-25 Thread Aljoscha Krettek
Hi, Sorry for getting back to this so late but I think the underlying problem is that S3 does not behave as expected by the Bucketing Sink. See this message by Stephan on the topic: https://lists.apache.org/thread.html/34b8ede3affb965c7b5ec1e404918b39c282c258809dfd7a6c257a61@%3Cuser.flink.apach

Re: Problem with Flink restoring from checkpoints

2017-07-21 Thread Francisco Blaya
Hi Aljoscha, I've tried both. When we restart from a manually created savepoint we see "restored from savepoint" in the Flink Dashboard. If we restart from externalized checkpoint we see "restored from checkpoint". In both scenarios we lose data in S3. Cheers, Fran On 20 July 2017 at 17:54, Alj

Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Aljoscha Krettek
You said you cancel and restart the job. How do you then restart the Job? From a savepoint or externalised checkpoint? Do you also see missing data when using an externalised checkpoint or a savepoint? Best, Aljoscha > On 20. Jul 2017, at 16:15, Francisco Blaya > wrote: > > Forgot to add tha

Re: Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Francisco Blaya
Forgot to add that when a job gets cancelled via the UI (this is not the case when the Yarn session is killed) a part file ending in ".pending" does appear in S3, but that never seems to be promoted to finished upon restart of the job On 20 July 2017 at 11:41, Francisco Blaya wrote: > Hi, > > Th

Re: Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Francisco Blaya
Hi, Thanks for your answers. @Fabian. I can see that Flink's consumer commits the offsets to Kafka, no problem there. However I'd expect everything that gets read from Kafka to appear in S3 at some point, even if the job gets stopped/killed before flushing and then restarted. And that's what is n

Re:Re: Problem with Flink restoring from checkpoints

2017-07-19 Thread Tzu-Li (Gordon) Tai
Hi, What Fabian mentioned is true. Flink Kafka Consumer’s exactly-once guarantee relies on offsets checkpoints as Flink state, and doesn’t rely on the committed offsets in Kafka. What we found is that Flink acks Kafka immediately before even writing to S3. What you mean by ack here is the offs

Re:Re: Problem with Flink restoring from checkpoints

2017-07-19 Thread 周思华
Hi Fran, is the DataTimeBucketer acts like a memory buffer and does't managed by flink's state? If so, then i think the problem is not about Kafka, but about the DateTimeBucketer. Flink won't take snapshot for the DataTimeBucketer if it not in any state. Best, Sihua Zhou At 2017-07-20 03

Re: Problem with Flink restoring from checkpoints

2017-07-19 Thread Fabian Hueske
Hi Fran, did you observe actual data loss due to the problem you are describing or are you discussing a possible issue based on your observations? AFAIK, Flink's Kafka consumer keeps track of the offsets itself and includes these in the checkpoints. In case of a recovery, it does not rely on the

Problem with Flink restoring from checkpoints

2017-07-19 Thread Francisco Blaya
Hi, We have a Flink job running on AWS EMR sourcing a Kafka topic and persisting the events to S3 through a DateTimeBucketer. We configured the bucketer to flush to S3 with an inactivity period of 5 mins.The rate at which events are written to Kafka in the first place is very low so it is easy for