> Q. If we're using read consistency ONE does the read request get sent to all > nodes in the replica set and the first to reply is returned (i.e. all replica > nodes will then have that row in their cache), OR does the request only get > sent to a single node in the replica set? If it's the latter would the same > node generally be used for all requests to the same key or would it always be > a random node in the replica set? (i.e. if we have multiple reads for one key > in quick succession would this entail potentially multiple disk lookups until > all nodes in the set have been hit?). At CL if all nodes will be involved in the request if Read Repair is active for the request (this is true for all CL's). if RR is not active for the request only 1 node will be involved. See read_repair_chance in the yaml file.
under the simple placement strategy it will be the first node in the replica set, unless the proximity of nodes has been modified by the dynamic snitch based on recent latency. See the badness_threshold in the yaml file for info on how to stick requests to a node to improve cache utilization. > Q. Related to the above, if only one node recieves the request would the > client (hector in this case) know which node to send the request to directly > or would there be potentially one extra network hop involved (client -> > random node -> node with key). it's possible, by adding "fat client" nodes to the cluster which do not participate in storage but can work out where things are. I would try several other optimizations before this one though. > > Q. Is it possible to do a warm cache load of the most recently accessed keys > on node startup or would we have to do this with a client app? See the cache save period settings for a CF described in the help for create column family in the CLI. > Q. With write consistency ANY is it correct that following a write request > all nodes in the replica set will end up with that row in their cache, as > well as on disk, once they receive the write? i.e. total cache size is > (cache_memory_per_node * num_nodes) / num_replicas. ANY means that if all of the natural replicas for a row are unavailable to coordinator node can store the row it's self with hints to send it on. When using ANY you will not now if the row ended up in one of the natural endpoints or the coordinator. I'd stay away until you know you can function with a low level of consistency. > Q. If the cluster only has a single column family, random partitioning and no > secondary indexes, is there a good metric for estimating how much heap space > we would need to leave aside for everything that isn't the row-cache? Would > it be proportional to the row-cache size or fairly constant? You can use the old skool pre 0.8 calculations...http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ FYI netflix run big (96G ?) memory machines and use a custom cache provider to store the rows in a node local memcache. This avoids the problems with GC'ings a very big heap. There is also a pre built native memory row cache provider that stores data off the JVM heap, see row_cache_provider in the CLI help for create column family. See the talk from Adrian Cockcroft here http://www.datastax.com/events/cassandrasf2011/presentations may need to watch the video for the part about using memcache. You are are sensitive to read latency also take care with the data model to reduce row fragmentation across SSTables. (Not an issue for a full row cache) Hope that helps. If you can provide some numbers on your scale and latency requirements it would be handy. Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 18/08/2011, at 9:01 PM, Stephen Henderson wrote: > Hi, > > We're currently in the planning stage of a new project which needs a low > latency, persistent key/value store with a roughly 60:40 read/write split. > We're trying to establish if Cassandra is a good fit for this and in > particular what the hardware requirements would be to have the majority of > rows cached in memory (other nosql platforms like Couchbase/Membase seem like > a more natural fit but we're already reasonably familiar with cassandra and > would rather stick with what we know if it can work). > > If anyone could help answer/clarify the following questions it would be a > great help (all assume that row-caching is enabled for the column family). > > Q. If we're using read consistency ONE does the read request get sent to all > nodes in the replica set and the first to reply is returned (i.e. all replica > nodes will then have that row in their cache), OR does the request only get > sent to a single node in the replica set? If it's the latter would the same > node generally be used for all requests to the same key or would it always be > a random node in the replica set? (i.e. if we have multiple reads for one key > in quick succession would this entail potentially multiple disk lookups until > all nodes in the set have been hit?). > > Q. Related to the above, if only one node recieves the request would the > client (hector in this case) know which node to send the request to directly > or would there be potentially one extra network hop involved (client -> > random node -> node with key). > > Q. Is it possible to do a warm cache load of the most recently accessed keys > on node startup or would we have to do this with a client app? > > Q. With write consistency ANY is it correct that following a write request > all nodes in the replica set will end up with that row in their cache, as > well as on disk, once they receive the write? i.e. total cache size is > (cache_memory_per_node * num_nodes) / num_replicas. > > Q. If the cluster only has a single column family, random partitioning and no > secondary indexes, is there a good metric for estimating how much heap space > we would need to leave aside for everything that isn't the row-cache? Would > it be proportional to the row-cache size or fairly constant? > > > Thanks, > Stephen > > > Stephen Henderson - Lead Developer (Onsite), Cognitive Match > stephen.hender...@cognitivematch.com | http://www.cognitivematch.com >