Hi all,
I am having trouble reconciling various metrics regarding reads so I'm
hoping someone here can help me understand what's going on.

I am running tests on a single node cluster with 16GB of RAM. I'm testing on
the following column family:
               Column Family: PUBLIC_MONTHLY
                SSTable count: 1
                Space used (live): 28468417160
                Space used (total): 28468417160
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 0
                Read Count: 2669019991
                Read Latency: 0.846 ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 20000
                Key cache size: 20000
                Key cache hit rate: 0.33393368358762754
                Row cache capacity: 50000
                Row cache size: 50000
                Row cache hit rate: 0.15195090894076155
                Compacted row minimum size: 216
                Compacted row maximum size: 88148
                Compacted row mean size: 483
The keys represent a grid cells (65 million), columns to store monthly
increments (total & sum, to produce averages), super columns tag the data
source
The mean row length is 483 bytes

The keycache & rowcache enabled but kept very low just to test going through
the disk since I expect very random reads in production.

I've done everything I can to optimize reads
 - Cassandra is setup to use only 4GB because my dataset is 28GB
 - I've compacted the data to a single file
 - I'm hitting cassandra with only 1 read request at a time & no writes. The
request is a multislice across hundreds or thousands of keys

The problem:
vmstat shows that Cassandra is doing about 200MB/s of IO and since there are
no writes on the system, I know it can only be reading (RAID-0 SSD drives).
I know that Cassandra is reading about 1/3 the super columns. To be safe,
let's assume Cassandra is deserializing 1/2 the row.
I'll just assume for simplicity that the row size is 512 bytes.

So it looks to me as if Cassandra is deserializing
200MB/((512bytes)/2)=400MB/(0.5KB) = 800K rows per second.
That's 800 keys per millisecond.

And yet, my app is being throttled by Cassandra during its
MultigetSuperSliceCounterQuery: measuring the time spent in Hector show that
I'm getting at most 20-30 rows per ms and sometimes I get

My questions:
1) Any idea where the discrepency can come from ?
I'd like to believe there is some magic setting that will x10 my read
performance...

2) How do you recommend allocating memory ? Should I give the OS cache as
much as possible or should I max out Cassandra's cache ?

3) Does anyone have numbers regarding the performance of range queries when
compared to multiget queries ? I can probably take SimpleGeo's idea of a
Z-order code to map the 2D grid to 1D ranges but I wonder if I will get the
x10 performance I'm looking for.

PS:Nodetool indicates that the read latency is 0.846ms so that's 1.12 key/ms
?! Let's just leave this aside, the process hasbeen running for 12 hours and
maybe the number are very different from what we're seeing here.

Thanks
PG

vmstat (SSD not maxed out in this but it does at other times)
 0  0  78184  89252  10764 11254784    0    0 186448    18 8002 2352  7  4
50 39
 0  9  78184  88880  10764 11249900    0    0 176602    78 8046 2957  7  3
64 26
 0 16  78184  88260  10764 11246824    0    0 195726     0 9090 2718  8  4
52 36
 0 14  78184  89376  10764 11242496    0    0 227858     0 9533 2444  7  4
45 44
 0  0  78184  88260  10764 11254336    0    0 203374     1 9144 2567  7  4
59 30
 0  4  78184  90368  10764 11251856    0    0 235394     0 9732 1827  6  4
52 38
 0 23  78184  92352  10756 11238000    0    0 203140    98 9007 2835  7  4
59 29
 0  0  78184  91608  10756 11250952    0    0 176348     0 8354 3535  7  3
64 26
 1  0  78184  92352  10756 11250228    0    0 163952     0 7475 3243  9  3
57 31

iostat -dmx 2 (filtered)
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdb              80.00     0.00 4061.50    0.00    94.34     0.00    47.57
 80.18   19.49   19.49    0.00   0.16  63.00
sda              78.50     0.00 3934.50    0.00    94.72     0.00    49.31
 76.87   19.27   19.27    0.00   0.16  62.80
dm-0              0.00     0.00 8310.50    0.00   192.47     0.00    47.43
169.89   20.15   20.15    0.00   0.08  63.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdb             101.50     0.00 5141.00    0.00   121.16     0.00    48.27
103.29   20.03   20.03    0.00   0.16  80.60
sda             100.00     0.00 5190.50    0.00   121.59     0.00    47.97
100.74   19.24   19.24    0.00   0.15  79.80
dm-0              0.00     0.00 10552.50    0.00   242.85     0.00    47.13
  219.09   20.57   20.57    0.00   0.08  81.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdb              67.50     0.00 3692.50    0.00    86.23     0.00    47.83
 64.89   17.92   17.92    0.00   0.15  57.00
sda              90.00     0.00 3680.00    0.00    87.22     0.00    48.54
 70.86   19.77   19.77    0.00   0.16  57.40
dm-0              0.00     0.00 7364.00    0.00   170.29     0.00    47.36
145.79   20.39   20.39    0.00   0.08  58.20

iming examples from my app
numRollupKeys=13312,getdata_ms=617 => 21.57 keys/ms
numRollupKeys=6144,getdata_ms=224  => 27.42
numRollupKeys=14080,getdata_ms=793 => 17.75
numRollupKeys=8448,getdata_ms=157  => 53.08
numRollupKeys=6400,getdata_ms=601  => 10.64
numRollupKeys=7680,getdata_ms=550  => 13.96
numRollupKeys=12800,getdata_ms=720 => 17.77
numRollupKeys=6912,getdata_ms=275  => 25.14

Reply via email to