> Did you see about equal CPU usage on the cassandra nodes during the > test? Is it possible that most or all of the keys generated by > stress.py simply fall on a single node?
CPU was approximately equal across the cluster; it was around 50%. stress.py generates keys randomly or using a gaussian distribution, both methods showed the same results. Finally, we're using a random partitioner, so Cassandra will hash the keys using md5 to map it to a position on the ring. -- David Schoonover On Jul 19, 2010, at 4:14 PM, Peter Schuller wrote: > The following is completely irrelevant if you are indeed using the > default storage-conf.xml as you said. However since I wrote it and it > remains relevant for anyone testing with the order preserving > partitioner, I might aswell post it rather than discard it... > > Begin probably irrelevant post: > > Another stab in the dark: > > You do specifically mention that you distributed tokens evenly across > the cluster and independently for each cluster size. However, were the > tokens distributed evenly *within the range used by the stress test*? > > This is the random key generator in stress.py: > > def key_generator_random(): > fmt = '%0' + str(len(str(total_keys))) + 'd' > return fmt % randint(0, total_keys - 1) > > Unless I am misreading/mis-testing, this will generate keys that are > essentially ASCII decimal characters in strings of equal length, with > numerical values distributed in the range [0,total_keys - 1]. However, > the key prefixes covered by the range '0-9' make up a very limited > subset of the token spaces into which cluster nodes are placed, for > both byte strings and UTF-8 strings. > > Did you see about equal CPU usage on the cassandra nodes during the > test? Is it possible that most or all of the keys generated by > stress.py simply fall on a single node? > > -- > / Peter Schuller