On Thu, Dec 16, 2010 at 2:35 PM, Wayne <wav...@gmail.com> wrote: > I have read that read latency goes up with the total data size, but to what > degree should we expect a degradation in performance?
I'm not sure this is generally answerable because of data modelling and workload variability, but there are some known performance-impacting issues with very large data files. For one example, this error : " WARN [COMPACTION-POOL:1] 2010-09-28 12:17:11,932 BloomFilter.java (line 82) Cannot provide an optimal BloomFilter for 245256960 elements (8/15 buckets per element). " Which I saw on a SSTable which was 90gb, around the size of one of your files. https://issues.apache.org/jira/browse/CASSANDRA-1555 Is open with some great work from the Twitter guys to deal with this particular problem. Generally, I'm sure that there are other similar issues, because the simple fact is that the set of people running very large datasets with Apache Cassandra in production is still relatively small, and non-squeaking wheels usually get less grease.. ;D =Rob