> My question is: what are the points in the system that you guys test? What > are the metrics for the test-points? Any flags that you guys use to see if > more capacity / nodes are needed? > > Thanks in advance. Trying to figure this out and figured I'd ask the > community with more experience than I have.
I would say the most important things are to know what your access pattern will be like (how much reads, writes, big values, small values; total number of values, total data size, replication factor, relative to memory size etc) and then combine that with actual metrics of CPU usage, request rates, etc. It is probably difficult to come up with the one true way to monitor and test (this tends to be true often, and in particular with storage systems). So; look at things like CPU usage, I/O utilization (iostat -x -k 1 and such or equivalent graphs), cfstat output etc. But how to interpret them will be very dependent on what you're doing. How's that for a non-answer? :) -- / Peter Schuller