Thanks. I am still in theory/evaluation mode. Will try to code this up to see if checkpoint will become an issue. I do have a high rate of ingest and lots of in flight data. Hopefully flink back pressure keeps this nicely bounded.
I doubt it will be a problem for me - because even spark is writing all in-flight data to disk - because all partitioning goes thru disk and is inline - ie sync. Flink disk usage is write only and for failure case only. Looks pretty compelling so far. On Friday, May 20, 2016, Ufuk Celebi <u...@apache.org> wrote: > On Thu, May 19, 2016 at 7:48 PM, Abhishek R. Singh > <abhis...@tetrationanalytics.com <javascript:;>> wrote: > > There seems to be some relationship between watermarks, triggers and > > checkpoint that is someone not being leveraged. > > Checkpointing is independent of this, yes. Did the state size become a > problem for your use case? There are various users running Flink with > very large state sizes without any issues. The recommended state > backend for these use cases is the RocksDB backend. > > The barriers are triggered at the sources and flow with the data > ( > https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/stream_checkpointing.html > ). > Everything in-flight after the barrier is not relevant for the > checkpoint. We are only interested in a consistent state snapshot. >