Hi, Is anybody currently running flink streaming with north of a terabyte (TB) of managed state? If you are, can you share your experiences wrt hardware, tuning, recovery situations, etc?
I'm evaluating flink for a use case I estimate will take around 5TB of state in total, but looking at the actual implementation of the rocksDB state and current lack of incremental checkpointing or recovery, it doesn't seem feasible. I have successfully tested flink up to roughly 90GB of managed state in rocksDB, but that's taking 5 minutes to checkpoint or recover (on a pretty beefy YARN cluster). For most cases, my state updates are idempotent and can be moved to something external. However, it'd be nice to know of any current of future plans for running flink at the terabyte scale. --Steven -- *CONFIDENTIALITY NOTICE: This email message, and any documents, files or previous e-mail messages attached to it is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.*