Hi all, I am having trouble reconciling various metrics regarding reads so I'm hoping someone here can help me understand what's going on.
I am running tests on a single node cluster with 16GB of RAM. I'm testing on the following column family: Column Family: PUBLIC_MONTHLY SSTable count: 1 Space used (live): 28468417160 Space used (total): 28468417160 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 2669019991 Read Latency: 0.846 ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 20000 Key cache size: 20000 Key cache hit rate: 0.33393368358762754 Row cache capacity: 50000 Row cache size: 50000 Row cache hit rate: 0.15195090894076155 Compacted row minimum size: 216 Compacted row maximum size: 88148 Compacted row mean size: 483 The keys represent a grid cells (65 million), columns to store monthly increments (total & sum, to produce averages), super columns tag the data source The mean row length is 483 bytes The keycache & rowcache enabled but kept very low just to test going through the disk since I expect very random reads in production. I've done everything I can to optimize reads - Cassandra is setup to use only 4GB because my dataset is 28GB - I've compacted the data to a single file - I'm hitting cassandra with only 1 read request at a time & no writes. The request is a multislice across hundreds or thousands of keys The problem: vmstat shows that Cassandra is doing about 200MB/s of IO and since there are no writes on the system, I know it can only be reading (RAID-0 SSD drives). I know that Cassandra is reading about 1/3 the super columns. To be safe, let's assume Cassandra is deserializing 1/2 the row. I'll just assume for simplicity that the row size is 512 bytes. So it looks to me as if Cassandra is deserializing 200MB/((512bytes)/2)=400MB/(0.5KB) = 800K rows per second. That's 800 keys per millisecond. And yet, my app is being throttled by Cassandra during its MultigetSuperSliceCounterQuery: measuring the time spent in Hector show that I'm getting at most 20-30 rows per ms and sometimes I get My questions: 1) Any idea where the discrepency can come from ? I'd like to believe there is some magic setting that will x10 my read performance... 2) How do you recommend allocating memory ? Should I give the OS cache as much as possible or should I max out Cassandra's cache ? 3) Does anyone have numbers regarding the performance of range queries when compared to multiget queries ? I can probably take SimpleGeo's idea of a Z-order code to map the 2D grid to 1D ranges but I wonder if I will get the x10 performance I'm looking for. PS:Nodetool indicates that the read latency is 0.846ms so that's 1.12 key/ms ?! Let's just leave this aside, the process hasbeen running for 12 hours and maybe the number are very different from what we're seeing here. Thanks PG vmstat (SSD not maxed out in this but it does at other times) 0 0 78184 89252 10764 11254784 0 0 186448 18 8002 2352 7 4 50 39 0 9 78184 88880 10764 11249900 0 0 176602 78 8046 2957 7 3 64 26 0 16 78184 88260 10764 11246824 0 0 195726 0 9090 2718 8 4 52 36 0 14 78184 89376 10764 11242496 0 0 227858 0 9533 2444 7 4 45 44 0 0 78184 88260 10764 11254336 0 0 203374 1 9144 2567 7 4 59 30 0 4 78184 90368 10764 11251856 0 0 235394 0 9732 1827 6 4 52 38 0 23 78184 92352 10756 11238000 0 0 203140 98 9007 2835 7 4 59 29 0 0 78184 91608 10756 11250952 0 0 176348 0 8354 3535 7 3 64 26 1 0 78184 92352 10756 11250228 0 0 163952 0 7475 3243 9 3 57 31 iostat -dmx 2 (filtered) Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 80.00 0.00 4061.50 0.00 94.34 0.00 47.57 80.18 19.49 19.49 0.00 0.16 63.00 sda 78.50 0.00 3934.50 0.00 94.72 0.00 49.31 76.87 19.27 19.27 0.00 0.16 62.80 dm-0 0.00 0.00 8310.50 0.00 192.47 0.00 47.43 169.89 20.15 20.15 0.00 0.08 63.80 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 101.50 0.00 5141.00 0.00 121.16 0.00 48.27 103.29 20.03 20.03 0.00 0.16 80.60 sda 100.00 0.00 5190.50 0.00 121.59 0.00 47.97 100.74 19.24 19.24 0.00 0.15 79.80 dm-0 0.00 0.00 10552.50 0.00 242.85 0.00 47.13 219.09 20.57 20.57 0.00 0.08 81.80 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 67.50 0.00 3692.50 0.00 86.23 0.00 47.83 64.89 17.92 17.92 0.00 0.15 57.00 sda 90.00 0.00 3680.00 0.00 87.22 0.00 48.54 70.86 19.77 19.77 0.00 0.16 57.40 dm-0 0.00 0.00 7364.00 0.00 170.29 0.00 47.36 145.79 20.39 20.39 0.00 0.08 58.20 iming examples from my app numRollupKeys=13312,getdata_ms=617 => 21.57 keys/ms numRollupKeys=6144,getdata_ms=224 => 27.42 numRollupKeys=14080,getdata_ms=793 => 17.75 numRollupKeys=8448,getdata_ms=157 => 53.08 numRollupKeys=6400,getdata_ms=601 => 10.64 numRollupKeys=7680,getdata_ms=550 => 13.96 numRollupKeys=12800,getdata_ms=720 => 17.77 numRollupKeys=6912,getdata_ms=275 => 25.14