I was looking closer at sliced_buffer_size_in_kb and
column_index_size_in_kb and reached the conclusion that for the
purpose of I/O, these are irrelevant when using mmap:ed I/O mode
(which makes sense, since there is no way to use a "buffer size" when
all you're doing is touching memory). The only effect is that
column_index_size_in_kb still affects the size at which indexing
triggers, which is as advertised.

Firstly, can anyone confirm/deny my interpretation?

Secondly, has anyone done testing as to the effects on mmap():ed I/O
on the efficiency (in terms of disk seeks) of reads on large data
sets? The CPU benefits of mmap() may be negated when disk-bound if the
read-ahead logic of the kernel interacts sub-optimally with
Cassandra's use-case. Potentially even reading more than a single page
could imply multiple seeks (assuming a loaded system with other I/O in
the queue) if there is no read-ahead until the first successive
access.

I have not checked what actually does happen, nor have I benchmarked
for comparison. But I'd be interested in hearing if people have
already addressed this in the past.

-- 
/ Peter Schuller

Reply via email to