On Dec 16, 2010, at 11:35 PM, Wayne wrote: > I have read that read latency goes up with the total data size, but to what > degree should we expect a degradation in performance? What is the "normal" > read latency range if there is such a thing for a small slice of scol/cols? > Can we really put 2TB of data on a node and get good read latency querying > data off of a handful of CFs? Any experience or explanations would be greatly > appreciated.
If you really mean 2TB per node I strongly advise you to perform thorough testing with real world column sizes and the read write load you expect. Try to load test at least with a test cluster / data that represents one replication group. I.e. RF=3 -> 3 nodes. And test with the consistency level you want to use. Also test ring operations (repair, adding nodes, moving nodes) while under expected load/ Combined with 'a handful of CFs' I would assume that you are expecting a considerable write load. You will get massive compaction load and with that data size the file system cache will suffer big time. You'll need loads of RAM and still ... I can only speak about 0.6 but ring management operations will become a nightmare and you will have very long running repairs. The cluster behavior changes massively with different access patterns (cold vs warm data) and data sizes. So you have to understand yours and test it. I think most generic load tests are mainly marketing instruments and I believe this is especially true for cassandra. Don't want to sound negative (I am a believer and don't regret our investment) but cassandra is no silver bullet. You really need to know what you are doing. Cheers, Daniel