Re: Cassandra benchmarking on Rackspace Cloud

Peter Schuller Mon, 19 Jul 2010 13:15:23 -0700

The following is completely irrelevant if you are indeed using the
default storage-conf.xml as you said. However since I wrote it and it
remains relevant for anyone testing with the order preserving
partitioner, I might aswell post it rather than discard it...


Begin probably irrelevant post:

Another stab in the dark:

You do specifically mention that you distributed tokens evenly across
the cluster and independently for each cluster size. However, were the
tokens distributed evenly *within the range used by the stress test*?

This is the random key generator in stress.py:

def key_generator_random():
    fmt = '%0' + str(len(str(total_keys))) + 'd'
    return fmt % randint(0, total_keys - 1)

Unless I am misreading/mis-testing, this will generate keys that are
essentially ASCII decimal characters in strings of equal length, with
numerical values distributed in the range [0,total_keys - 1]. However,
the key prefixes covered by the range '0-9' make up a very limited
subset of the token spaces into which cluster nodes are placed, for
both byte strings and UTF-8 strings.

Did you see about equal CPU usage on the cassandra nodes during the
test? Is it possible that most or all of the keys generated by
stress.py simply fall on a single node?

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Reply via email to