Re: Problem with Flink restoring from checkpoints

2017-08-25 Thread Aljoscha Krettek
Hi, Sorry for getting back to this so late but I think the underlying problem is that S3 does not behave as expected by the Bucketing Sink. See this message by Stephan on the topic: https://lists.apache.org/thread.html/34b8ede3affb965c7b5ec1e404918b39c282c258809dfd7a6c257a61@%3Cuser.flink.apach

Re: Problem with Flink restoring from checkpoints

2017-07-21 Thread Francisco Blaya
Hi Aljoscha, I've tried both. When we restart from a manually created savepoint we see "restored from savepoint" in the Flink Dashboard. If we restart from externalized checkpoint we see "restored from checkpoint". In both scenarios we lose data in S3. Cheers, Fran On 20 July 2017 at 17:54, Alj

Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Aljoscha Krettek
You said you cancel and restart the job. How do you then restart the Job? From a savepoint or externalised checkpoint? Do you also see missing data when using an externalised checkpoint or a savepoint? Best, Aljoscha > On 20. Jul 2017, at 16:15, Francisco Blaya > wrote: > > Forgot to add tha

Re: Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Francisco Blaya
Forgot to add that when a job gets cancelled via the UI (this is not the case when the Yarn session is killed) a part file ending in ".pending" does appear in S3, but that never seems to be promoted to finished upon restart of the job On 20 July 2017 at 11:41, Francisco Blaya wrote: > Hi, > > Th

Re: Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Francisco Blaya
Hi, Thanks for your answers. @Fabian. I can see that Flink's consumer commits the offsets to Kafka, no problem there. However I'd expect everything that gets read from Kafka to appear in S3 at some point, even if the job gets stopped/killed before flushing and then restarted. And that's what is n

Re: Problem with Flink restoring from checkpoints

2017-07-19 Thread Fabian Hueske
Hi Fran, did you observe actual data loss due to the problem you are describing or are you discussing a possible issue based on your observations? AFAIK, Flink's Kafka consumer keeps track of the offsets itself and includes these in the checkpoints. In case of a recovery, it does not rely on the