Hi all, We're interested in doing some analysis on how the size of our savepoints and state affects the time it takes to restore from a savepoint. We're running Flink 1.12 and using RocksDB as a state backend, on Kubernetes.
What is the best way to measure the size of a Flink Application's state? Is state.backend.rocksdb.metrics.total-sst-files-size <https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#state-backend-rocksdb-metrics-total-sst-files-size> the right thing to look at? We tried looking at state.backend.rocksdb.metrics.total-sst-files-size for all our operators, after restoring from a savepoint, and we noticed that the sum of all the sst files sizes is much much smaller than the total size of our savepoint (7GB vs 10TB). Where does that discrepancy come from? Do you have any general advice on correlating savepoint size with restore times? Thanks in advance!