We have a cluster that is experiencing very high disk read I/O in the 20-40 MB/sec range on m5.2x (gp2 drives). This is verified via VM metrics as well as iotop.
When we switch m5.4x it drops to 60 KB/sec. There is no difference in network send/recv, read/write request counts. The graph for read kb/sec mirrors the cpu.iowait. Compaction would have similar writes to go with reads as the sstables were written. Flushing would be almost all writes. Swappiness is zero. I have done inotifywait to compare read volume on the data and log dirs. They are roughly equivalent. File Caching could be a candidate, I used tobert's : https://github.com/tobert/pcstat to see what files are in the file cache, and that listed all files at 100%, I would think an overloaded file cache would have different files swapping into the cache and partials on the data files (data density for the node is about 30 GB). iotop indicates all the read traffic is from cassandra threads. Anyone have similar experiences?