Hello! I've been digging into State Storage documentation, but it's left me scratching my head with a few questions. Any help will be much appreciated.
Qs: 1. Is there a way to use RocksDB state backend for Flink on AWS EMR? Possibly with S3 backed savepoints for recovery (or maybe hdfs for savepoints?)? Only documentation related to AWS I can find makes it look like AWS must use the S3 File System state backend and not RocksDB at all. https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/s3.html 2. Does the FS state backend not compact? I thought everything in Flink was stored as key/value. In which case, why would the last n values for a key need to stick around, or how would they? > An incremental checkpoint builds upon (typically multiple) previous checkpoints. Flink leverages RocksDB’s internal compaction mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically. 3. In the docs, Operators are referred to as non-keyed state, yet, Operators have IDs that they are keyed by, so why are they referred to as non-keyed state? https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/savepoints.html#assigning-operator-ids 4. For the Table API / SQL are primary keys and join keys automatically used as the keys for state under the hood? Lastly 5. Is there a way to estimate roughly how much disk space state storage will take per operation? Thanks again! -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>