Hi,
I have a stateful Flink job with 500k QPS. The job basically counts the message number on a combination key with 10 minutes tumbling window. If I use memory state backend, the job can run without lag but periodically fails due to OOM. If I turn up RocksDB state backend, it will have a high Kafka lag even about memory tunning. The QPS is also growing very fast. I am wondering whether we have good guidance for performance tunning of RocksDB state backend for such kind of large QPS jobs. Best Regards Peter Huang