Re: Loading state from sink/understanding recovery options

Eduardo Winpenny Tejedor Thu, 13 Jun 2019 08:31:11 -0700

Is it possible someone could comment on this question in either direction
please?


Thanks,
Eduardo

On Sat, 8 Jun 2019, 14:10 Eduardo Winpenny Tejedor, <
eduardo.winpe...@gmail.com> wrote:

> Hi,
>
> In generic terms, if a keyed operator outputs its state into a sink, and
> the state of that operator after a restart can be derived from the sink's
> contents, can we do just that, as opposed to using checkpointing mechanics?
> Would that reduce latency (even if only theoretically)?
>
> An example: A word counting app with a Kafka source and a Kafka sink
> (compacted topic). The input is an unbounded stream of single words and the
> output is  a <word, word_count> tuple stream that goes into a Kafka
> compacted topic.
>
> AFAIK Flink can guarantee exactly-once semantics by updating the keyed
> state of its counting operator on _every_ event - after a restart it can
> then resume from the last checkpoint. But in this case, given the sink
> contains exactly the same relevant information as a checkpoint (namely the
> <key, key_count> tuples), could we load the state of an operator from our
> sink and avoid the latency added by our state backend? If so, how can this
> be achieved?
>
> If we replaced the Kafka sink with a database sink, could we on startup
> know which keys a Flink task has been allocated, perform a _single_ query
> to the database to load the key_counts and load those into the operator?
> How can this be achieved? Instead of a single query you may want to do a
> batched query, as long as you're not querying the database once per key.
>
> Thanks,
> Eduardo
>

Re: Loading state from sink/understanding recovery options

Reply via email to