Re: random keys and overlapping key ranges in SSTables

2011-12-31 Thread Kent Tong
> bloom filters can guess right sstables to be read with high  > probability < 0.1%. In reality even if you are using size based  > compaction and have about 300 sstables, reading is fast unless > there is row fragmentation and you are reading entire row. Right, that's it. Thanks!

random keys and overlapping key ranges in SSTables

2011-12-31 Thread Kent Tong
Hi, If the rows are updated by random keys, then most of the SSTables will have overlapping  key ranges, right? Then for each read, Cassandra will go through all the SSTables (or one SSTable in each level for the leveled compaction strategy)? How to deal with this problem? Thanks! -- Author o

Re: memory estimate for each key in the key cache

2011-12-20 Thread Kent Tong
> It is not telling you to multiply your key size by 10-12, it is telling you to > multiply the output of the nodetool cfstats reported "key cache size" by > 10-12. The  "key cache size" reported is actually the number of keys in the key cache. So, it is the same thing as suggesting each key ta

memory estimate for each key in the key cache

2011-12-16 Thread Kent Tong
Hi, From the source code I can see that for each key, the hash (token), the key itself (ByteBuffer) and the position (long. offset in the sstable) are stored into the key cache. The hash is an MD5 hash, so it is 16 bytes. So, the total size required is at least 16+size-of(key)+4 which is > 20 b

Re: performance reaching plateau while the hardware is still idle

2011-12-15 Thread Kent Tong
) From: Peter Tillotson To: "user@cassandra.apache.org" ; Kent Tong Sent: Friday, December 16, 2011 1:45 AM Subject: Re: performance reaching plateau while the hardware is still idle May I suggest dstat, does cpu, memory, and io on one console dstat -vn 3 S

performance reaching plateau while the hardware is still idle

2011-12-15 Thread Kent Tong
Hi, I am running a performance test for Cassandra 1.0.5. It can perform about 1500 business operation (one read+one write to the same row) per second. However, the CPU is still 85% idle (as shown by vmstat) and the IO utilization is less than a few percent (as shown by iostat). "nodetool tpstat

key and storage proximity

2011-11-21 Thread Kent Tong
Hi, I was wondering if two records with "adjacent" keys are stored near-by, or with "adjacent" key hashes or neither? Thanks!

Re: read performance problem

2011-11-21 Thread Kent Tong
Tong Sent: Monday, November 21, 2011 5:22 AM Subject: Re: read performance problem There is something wrong with the system. Your benchmarks are way off. How are you benchmarking? Are you using the stress lib included? On Nov 19, 2011 8:58 PM, "Kent Tong" wrote: Hi, > > >

read performance problem

2011-11-19 Thread Kent Tong
Hi, On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am testing the  performance of Cassandra. The write performance is good: It can write a million records  in 10 minutes. However, the query performance is poor and it takes 10 minutes to read  10K records with sequential keys