> I tried setting the IO mode to standard, but it seemed to be a little slower > and couldn't get the machine to come back online with adequate read > performance, so I set it back. I'll have to write a solid cache warming > script if I'm going to try that again.
What cache are you talking about? Did you turn on row caching? When we turned on row caching, repeat hits to the same rows was fast, of course, but we didnt (given our data access patterns) see significant differences compared to mmap-ing the data. And once we hit the limit of our row cache, out-of-cache hits were pretty costly (dont have hard numbers, but I recall it being worse than having mmap page in/out). Is your client making random reads of more rows than will fit in RAM on your box? We found that in that scenario, after cassandra has used up all of the free memory on the box, using mmap was slightly worse than using standard data access. We happened to be lucky that our real world data access is limited to a small subset of rows in any given time period, so mmap works great for us. I guess the best thing to do is to try to figure out how to make a cassandra node only need to service requests for data that can fit into memory in a given time period. More nodes, a lower replication factor, more memory, I guess... Im definitely waiting to hear how things change with 0.6.2. Kyusik Chung