Thanks. I am still in theory/evaluation mode. Will try to code this up to
see if checkpoint will become an issue. I do have a high rate of ingest and
lots of in flight data. Hopefully flink back pressure keeps this
nicely bounded.

I doubt it will be a problem for me - because even spark is writing
all in-flight data to disk - because all partitioning goes thru disk and is
inline - ie sync. Flink disk usage is write only and for failure case only.
Looks pretty compelling so far.

On Friday, May 20, 2016, Ufuk Celebi <u...@apache.org> wrote:

> On Thu, May 19, 2016 at 7:48 PM, Abhishek R. Singh
> <abhis...@tetrationanalytics.com <javascript:;>> wrote:
> > There seems to be some relationship between watermarks, triggers and
> > checkpoint that is someone not being leveraged.
>
> Checkpointing is independent of this, yes. Did the state size become a
> problem for your use case? There are various users running Flink with
> very large state sizes without any issues. The recommended state
> backend for these use cases is the RocksDB backend.
>
> The barriers are triggered at the sources and flow with the data
> (
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/stream_checkpointing.html
> ).
> Everything in-flight after the barrier is not relevant for the
> checkpoint. We are only interested in a consistent state snapshot.
>

Reply via email to