how to reduce disk read? (and bloom filter performance)

Yang Thu, 15 Sep 2011 23:32:16 -0700

after I put my cassandra cluster on heavy load (1k/s  write + 1k/s
read  ) for 1 day,
I accumulated about 30GB of data in sstables. I think the caches have
warmed up to their
stable state.


when I started this, I manually cat all the sstables to /dev/null , so
that they are loaded into memory
(the system mem is 32GB, so a lot of extra space ), at that time, "sar
-B " shows about 100page in requests per second.

but after 1 day, I begin to see consistently 2000 page-in requests per
sec. and the end response latency seen by
application is also higher.


so I was curious about how often my Cassandra server resorted to
reading the sstables, I looked at the JMX attributes
CFS.BloomFilterFalseRatio, it's 1.0 , BloomFilterFalsePositives, it's
2810,  ReadCount is about 1 million (these are numbers after a
restart, so smaller ), so I get about 0.2% of reads going to disk. I
am wondering  what is the ball park number that you see with your
production
cluster? is 0.2% a good number?

besides bloomFilter, what other approaches do we have to avoid disk
reads? ----- as data grows, apparently we can't fit all that in
memory,
increase machine count so that the data volume per-box fits into memory?


Thanks
Yang

how to reduce disk read? (and bloom filter performance)

Reply via email to