Hi Yun, > From your description, I think you actually concern more about the overall > performance instead of the high disk IOPs. Maybe you should first ensure > whether the job performance degradation is related to RocksDB's performance.
You are right that my main concern is the overall performance, not that it’s reading a lot. I connected the two things together because read IOPs seems to jump to a very high number after 30 minutes into the benchmark, which correlates to the timing of the overall performance degradation. > Then I would share some experience about tuning RocksDB performance. Since > you did not cache index and filter in block cache, it's no worry about the > competition between data blocks and index&filter blocks[1]. And to improve > the read performance, you should increase your block cache size to 256MB or > even 512MB. What's more, writer buffer in rocksDB also acts as a role for > reading, from our experience, we use 4 max write buffers and 32MB each, e.g. > setMaxWriteBufferNumber(4) and setWriteBufferSize(32*1024*1024) This is very helpful. I did try increasing the block cache to 256MB or 512MB. It quickly used up the 30GB memory on the EC2 instances. I found it a little hard to estimate the actual memory usage of RocksDB as there might be multiple instances on the same TM depending on the number of slots and job. In this case, each instance has 16 cores and 30GB of memory. Each TM has 8 slots. The job parallelism equals to the total number of slots across all TMs so they use all the slots. Heap is set to 5GB. With 64MB block cache size, the memory usage seems to hover around 20GB to 25GB, but it creeps up very slowly over time. Do you have a good strategy for memory usage estimation or recommendation for how much memory each instance should have? Thanks, Ning