On Thu, Dec 16, 2010 at 2:35 PM, Wayne <wav...@gmail.com> wrote:

> I have read that read latency goes up with the total data size, but to what
> degree should we expect a degradation in performance?

I'm not sure this is generally answerable because of data modelling
and workload variability, but there are some known
performance-impacting issues with very large data files.

For one example, this error :

"
WARN [COMPACTION-POOL:1] 2010-09-28 12:17:11,932 BloomFilter.java
(line 82) Cannot provide an optimal BloomFilter for 245256960 elements
(8/15 buckets per element).
"

Which I saw on a SSTable which was 90gb, around the size of one of your files.

https://issues.apache.org/jira/browse/CASSANDRA-1555

Is open with some great work from the Twitter guys to deal with this
particular problem.

Generally, I'm sure that there are other similar issues, because the
simple fact is that the set of people running very large datasets with
Apache Cassandra in production is still relatively small, and
non-squeaking wheels usually get less grease.. ;D

=Rob

Reply via email to