Re: State Storage Questions

2020-09-08 Thread Rex Fenley
Thanks a bunch! >For example, the Flink Kafka source operator's parallel instances maintain as operator state a mapping of partitions to offsets for the partitions that it is assigned to. This I think clarifies things. This is literally state for the operator to do its job, not really row data. T

Re: State Storage Questions

2020-09-07 Thread Tzu-Li (Gordon) Tai
Hi! Operator state is bound to a single parallel operator instance; there is no partitioning happening here. It is typically used in Flink source and sink operators. For example, the Flink Kafka source operator's parallel instances maintain as operator state a mapping of partitions to offsets for

Re: State Storage Questions

2020-09-04 Thread Rex Fenley
This is so helpful, thank you! So just to clarify (3), Operator state has a partitioning scheme, but it's simply not by key, it's something else that's special under-the-hood? In which case, what data is stored in an Operator? I assumed it must be the input data for e.g. a join, so that it can rea

Re: State Storage Questions

2020-09-04 Thread Tzu-Li (Gordon) Tai
Hi, On Fri, Sep 4, 2020 at 1:37 PM Rex Fenley wrote: > Hello! > > I've been digging into State Storage documentation, but it's left me > scratching my head with a few questions. Any help will be much appreciated. > > Qs: > 1. Is there a way to use RocksDB state backend for Flink on AWS EMR? > Po

State Storage Questions

2020-09-03 Thread Rex Fenley
Hello! I've been digging into State Storage documentation, but it's left me scratching my head with a few questions. Any help will be much appreciated. Qs: 1. Is there a way to use RocksDB state backend for Flink on AWS EMR? Possibly with S3 backed savepoints for recovery (or maybe hdfs for savep