State Storage Questions

Rex Fenley Thu, 03 Sep 2020 22:37:23 -0700

Hello!

I've been digging into State Storage documentation, but it's left me
scratching my head with a few questions. Any help will be much appreciated.

Qs:
1. Is there a way to use RocksDB state backend for Flink on AWS EMR?
Possibly with S3 backed savepoints for recovery (or maybe hdfs for
savepoints?)? Only documentation related to AWS I can find makes it look
like AWS must use the S3 File System state backend and not RocksDB at all.
https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/s3.html

2. Does the FS state backend not compact? I thought everything in Flink was
stored as key/value. In which case, why would the last n values for a key
need to stick around, or how would they?
> An incremental checkpoint builds upon (typically multiple) previous
checkpoints. Flink leverages RocksDB’s internal compaction mechanism in a
way that is self-consolidating over time. As a result, the incremental
checkpoint history in Flink does not grow indefinitely, and old checkpoints
are eventually subsumed and pruned automatically.

3. In the docs, Operators are referred to as non-keyed state, yet,
Operators have IDs that they are keyed by, so why are they referred to as
non-keyed state?
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/savepoints.html#assigning-operator-ids

4. For the Table API / SQL are primary keys and join keys automatically
used as the keys for state under the hood?

Lastly
5. Is there a way to estimate roughly how much disk space state storage
will take per operation?

Thanks again!

Rex Fenley | Software Engineer - Mobile and Backend

Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/>
| FOLLOW
US <https://twitter.com/remindhq> | LIKE US
<https://www.facebook.com/remindhq>

State Storage Questions

Reply via email to