But I think it's bad idea, since hot data will be evenly distributed between multiple sstables and filesystem pages.
On Thu, May 31, 2012 at 1:08 PM, crypto five <cryptof...@gmail.com> wrote: > You may also consider disabling key/row cache at all. > 1mm rows * 400 bytes = 400MB of data, can easily be in fs cache, and you > will access your hot keys with thousands of qps without hitting disk at all. > Enabling compression can make situation even better. > > > On Thu, May 31, 2012 at 12:01 PM, Gurpreet Singh <gurpreet.si...@gmail.com > > wrote: > >> Aaron, >> Thanks for your email. The test kinda resembles how the actual >> application will be. >> It is going to be a simple key-value store with 500 million keys per >> node. The traffic will be read heavy in steady state, and there will be >> some keys that will have a lot more traffic than others. The expected hot >> rows are estimated to be anywhere between 500000 to 1 million keys. >> >> I have already populated this test system with 500 million keys, >> compacted it all to 1 file to check the size of the bloom filter and the >> index. >> >> This is how i am estimating my memory for 500 million keys. plz correct >> me if i am wrong or if i am missing any step. >> >> bloom filter: 1 gig >> index samples: Index file is 8.5 gig. I believe this index file is for >> all keys. Index interval is 128. Hence in RAM, this would be (8.5g / >> 128)*10 (factor for datastructure overhead) = 664 mb (lets say 1 gig) >> >> key cache size (3 million): 3 gigs >> memtable_total_space_mb : 2 gigs >> >> This totals 7 gig. >> my heap size is 8 gigs. >> Is there anything else that i am missing here? >> When i do top right now, it shows java as 96% memory, thats a concern >> because there is no write load. Should i be looking at any other number >> here? >> >> Off heap row cache: 500,000 - 750,000 ~ 3 and 5 gigs (avg row size = >> 250-500 bytes) >> >> My test system has 16 gigs RAM, production system will mostly have 32 >> gigs RAM and 12 spindles instead of 6 that i am testing with. >> >> I changed the underneath filesystem from xfs to ext2, and i am seeing >> better results, though not the best. >> The cfstats latency is down to 20 ms for 35 qps read load. row cache hit >> rate is 0.21, key cache = 0.75. >> Measuring from the client side, i am seeing roughly 10-15 ms per key, i >> would want even lesser though, any tips would greatly help. >> In production, i am hoping the row cache hit rate will be higher. >> >> >> The biggest thing that is affecting my system right now is the "Invalid >> frame size of 0" error that cassandra server seems to be printing. Its >> causing read timeouts every minute or 2 minutes. I havent been able to >> figure out a way to fix this one. I see someone else also reported seeing >> this, but not sure where the problem is hector, cassandra or thrift. >> >> Thanks >> Gurpreet >> >> >> >> >> >> >> On Wed, May 30, 2012 at 4:38 PM, aaron morton <aa...@thelastpickle.com>wrote: >> >>> 80 ms per request >>> >>> sounds high. >>> >>> I'm doing some guessing here, i am guessing memory usage is the problem.. >>> >>> * I assume you are not longer seeing excessive GC activity. >>> * The key cache will not get used when you hit the row cache. I would >>> disable the row cache if you have a random workload, which it looks like >>> you do. >>> * 500 million is a lot of keys to have on a single node. At the default >>> index sample of every 128 keys it will have about 4 million samples, which >>> is probably taking up a lot of memory. >>> >>> Is this testing a real world scenario or an abstract benchmark ? IMHO >>> you will get more insight from testing something that resembles your >>> application. >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 26/05/2012, at 8:48 PM, Gurpreet Singh wrote: >>> >>> Hi Aaron, >>> Here is the latest on this.. >>> i switched to a node with 6 disks and running some read tests, and i am >>> seeing something weird. >>> >>> setup: >>> 1 node, cassandra 1.0.9, 8 cpu, 16 gig RAM, 6 7200 rpm SATA data disks >>> striped 512 kb, commitlog mirrored. >>> 1 keyspace with just 1 column family >>> random partitioner >>> total number of keys: 500 million (the keys are just longs from 1 to 500 >>> million) >>> avg key size: 8 bytes >>> bloom filter size: 1 gig >>> total disk usage: 70 gigs compacted 1 sstable >>> mean compacted row size: 149 bytes >>> heap size: 8 gigs >>> keycache size: 2 million (takes around 2 gigs in RAM) >>> rowcache size: 1 million (off-heap) >>> memtable_total_space_mb : 2 gigs >>> >>> test: >>> Trying to do 5 reads per second. Each read is a multigetslice query for >>> just 1 key, 2 columns. >>> >>> observations: >>> row cache hit rate: 0.4 >>> key cache hit rate: 0.0 (this will increase later on as system moves to >>> steady state) >>> cfstats - 80 ms >>> >>> iostat (every 5 seconds): >>> >>> r/s : 400 >>> %util: 20% (all disks are at equal utilization) >>> await: 65-70 ms (for each disk) >>> svctm : 2.11 ms (for each disk) >>> r-kB/s - 35000 >>> >>> why this is weird is because.. >>> 5 reads per second is causing a latency of 80 ms per request (according >>> to cfstats). isnt this too high? >>> 35 MB/s is being read from the disk. That is again very weird. This >>> number is way too high, avg row size is just 149 bytes. Even index reads >>> should not cause this high data being read from the disk. >>> >>> what i understand is that each read request translates to 2 disk >>> accesses (because there is only 1 sstable). 1 for the index, 1 for the >>> data. At such a low reads/second, why is the latency so high? >>> >>> would appreciate help debugging this issue. >>> Thanks >>> Gurpreet >>> >>> >>> On Tue, May 22, 2012 at 2:46 AM, aaron morton >>> <aa...@thelastpickle.com>wrote: >>> >>>> With >>>> >>>> heap size = 4 gigs >>>> >>>> I would check for GC activity in the logs and consider setting it to 8 >>>> given you have 16 GB. You can also check if the IO system is saturated ( >>>> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html) >>>> Also take a look at nodetool cfhistogram perhaps to see how many sstables >>>> are involved. >>>> >>>> >>>> I would start by looking at the latency reported on the server, then >>>> work back to the client…. >>>> >>>> I may have missed it in the email but what recent latency for the CF is >>>> reported by nodetool cfstats ? That's latency for a single request on a >>>> single read thread. The default settings give you 32 read threads. >>>> >>>> If you know the latency for a single request, and you know you have 32 >>>> concurrent read threads, you can get an idea of the max throughput for a >>>> single node. Once you get above that throughput the latency for a request >>>> will start to include wait time. >>>> >>>> It's a bit more complicated, because when you request 40 rows that >>>> turns into 40 read tasks. So if two clients send a request for 40 rows at >>>> the same time there will be 80 read tasks to be processed by 32 threads. >>>> >>>> Hope that helps. >>>> >>>> ----------------- >>>> Aaron Morton >>>> Freelance Developer >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 20/05/2012, at 4:10 PM, Radim Kolar wrote: >>>> >>>> Dne 19.5.2012 0:09, Gurpreet Singh napsal(a): >>>> >>>> Thanks Radim. >>>> >>>> Radim, actually 100 reads per second is achievable even with 2 disks. >>>> >>>> it will become worse as rows will get fragmented. >>>> >>>> But achieving them with a really low avg latency per key is the issue. >>>> >>>> >>>> I am wondering if anyone has played with index_interval, and how much >>>> of a difference would it make to reads on reducing the index_interval. >>>> >>>> close to zero. but try it yourself too and post your findings. >>>> >>>> >>>> >>> >>> >> >