> I have a few sstables with around 500 million keys, and memory usage has > grown a lot, I suppose because of the indexes. This sstables are > comprised of skinny rows, but a lot of them. Would tuning index interval > make the memory usage go down? And what would the performance hit be?
Assuming no row caching, and assuming you're talking about heap usage and not the virtual size of the process in top, the primary two things that will grow with row count are (1) bloom filters for sstables and (2) the sampled index keys. Bloom filters are of a certain size to achieve a sufficiently small false positive rate. That target rate could be increased to allow smaller bloom filters, but that is not exposed as a configuration option and would require code changes. For key sampling, the primary performance penalty should be CPU and maybe some disk. On average, when looking up a key an sstable index file, you'll read sample interval/2 entries and deserialize them before finding the one you're after. Increasing sampling interval will thus increase the amount of deserialization taking place, as well as make the average range of data span additional pages on disk. The impact on disk is difficult to judge and likely depends a lot on i/o scheduling and other details. -- / Peter Schuller