We are experiencing very slow performance on Amazon EC2 after a cold boot. 10-20 tps. After the cache is primed things are much better, but it would be nice if users who aren't in cache didn't experience such slow performance.
Before dumping a bunch of config I just had some general questions. - We are using uuid keys, 40m of them and the random partitioner. Typical access pattern is reading 200-300 keys in a single web request. Are uuid keys going to be painful b/c they are so random. Should we be using less random keys, maybe with a shard prefix (01-80), and make sure that our tokens group user data together on the cluster (via the order preserving partitioner) - Would the order preserving partitioner be a better option in the sense that it would group a single users data to a single set of machines (if we added a prefix to the uuid)? - Is there any benefit to doing sharding of our own via Keyspaces. 01-80 keyspaces to split up the data files. (we already have 80 mysql shards we are migrating from, so doing this wouldn't be terrible implementation wise) - Should a goal be to get the data/index files as small as possible. Is there a size at which they become problematic? (Amazon EC2/EBS fyi) - Via more servers - Via more cassandra instances on the same server - Via manual sharding by keyspace - Via manual sharding by columnfamily Thanks, -- -jason horman