I was looking closer at sliced_buffer_size_in_kb and column_index_size_in_kb and reached the conclusion that for the purpose of I/O, these are irrelevant when using mmap:ed I/O mode (which makes sense, since there is no way to use a "buffer size" when all you're doing is touching memory). The only effect is that column_index_size_in_kb still affects the size at which indexing triggers, which is as advertised.
Firstly, can anyone confirm/deny my interpretation? Secondly, has anyone done testing as to the effects on mmap():ed I/O on the efficiency (in terms of disk seeks) of reads on large data sets? The CPU benefits of mmap() may be negated when disk-bound if the read-ahead logic of the kernel interacts sub-optimally with Cassandra's use-case. Potentially even reading more than a single page could imply multiple seeks (assuming a loaded system with other I/O in the queue) if there is no read-ahead until the first successive access. I have not checked what actually does happen, nor have I benchmarked for comparison. But I'd be interested in hearing if people have already addressed this in the past. -- / Peter Schuller