Hi,
Sorry for getting back to this so late but I think the underlying problem is
that S3 does not behave as expected by the Bucketing Sink. See this message by
Stephan on the topic:
https://lists.apache.org/thread.html/34b8ede3affb965c7b5ec1e404918b39c282c258809dfd7a6c257a61@%3Cuser.flink.apach
Hi Aljoscha,
I've tried both. When we restart from a manually created savepoint we see
"restored from savepoint" in the Flink Dashboard. If we restart from
externalized checkpoint we see "restored from checkpoint". In both
scenarios we lose data in S3.
Cheers,
Fran
On 20 July 2017 at 17:54, Alj
You said you cancel and restart the job. How do you then restart the Job? From
a savepoint or externalised checkpoint? Do you also see missing data when using
an externalised checkpoint or a savepoint?
Best,
Aljoscha
> On 20. Jul 2017, at 16:15, Francisco Blaya
> wrote:
>
> Forgot to add tha
Forgot to add that when a job gets cancelled via the UI (this is not the
case when the Yarn session is killed) a part file ending in ".pending" does
appear in S3, but that never seems to be promoted to finished upon restart
of the job
On 20 July 2017 at 11:41, Francisco Blaya
wrote:
> Hi,
>
> Th
Hi,
Thanks for your answers.
@Fabian. I can see that Flink's consumer commits the offsets to Kafka, no
problem there. However I'd expect everything that gets read from Kafka to
appear in S3 at some point, even if the job gets stopped/killed before
flushing and then restarted. And that's what is n
Hi Fran,
did you observe actual data loss due to the problem you are describing or
are you discussing a possible issue based on your observations?
AFAIK, Flink's Kafka consumer keeps track of the offsets itself and
includes these in the checkpoints. In case of a recovery, it does not rely
on the