With you use compression you should play with your block size. I believe the default may be 32K but I had more success with 8K, nearly same compression ratio, less young gen memory pressure.
On Thu, May 16, 2013 at 10:42 AM, Keith Wright <kwri...@nanigans.com> wrote: > The biggest reason I'm using compression here is that my data lends itself > well to it due to the composite columns. My current compression ratio is > 30.5%. Not sure it matters but my BF false positive ration os 0.048. > > From: Edward Capriolo <edlinuxg...@gmail.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Thursday, May 16, 2013 10:23 AM > To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Subject: Re: SSTable size versus read performance > > I am not sure of the new default is to use compression, but I do not > believe compression is a good default. I find compression is better for > larger column families that are sparsely read. For high throughput CF's I > feel that decompressing larger blocks hurts performance more then > compression adds. > > > On Thu, May 16, 2013 at 10:14 AM, Keith Wright <kwri...@nanigans.com>wrote: > >> Hi all, >> >> I currently have 2 clusters, one running on 1.1.10 using CQL2 and one >> running on 1.2.4 using CQL3 and Vnodes. The machines in the 1.2.4 cluster >> are expected to have better IO performance as we are going from 1 SSD data >> disk per node in the 1.1 cluster to 3 SSD data disks per node in the 1.2 >> cluster with higher end drives (commit logs are on their own disk shared >> with the OS). I am doing some stress testing on the 1.2 cluster and have >> found that although the reads / sec as seen from iostat are approximately >> the same (3K / sec) in both clusters, the MB/s read in the new cluster is >> MUCH higher (7 MB/s in 1.1 as compared to 30-50 MB/s in 1.2). As a result, >> I am seeing excessive iowait in the 1.2 cluster causing high average read >> times of 30 ms under the same load (1.1 cluster sees around 5 ms). They >> are both using Leveled compaction but one thing I did change in the new >> cluster was to increase the sstable size from the OOTB setting to 32 MB. >> Note that my reads are by definition highly random as we are running >> memcached in front for various reasons. Does cassandra need to read the >> entire SSTable when fetching a row or only the relevant chunk (I have the >> OOTB chunk size and BF settings)? I just decreased the sstable size to 5 >> MB and am waiting for compactions to complete to see if that makes a >> difference. >> >> Thanks! >> >> Relevant table definition if helpful (note that I also changed to the LZ4 >> compressor expecting better read performance and I decreased the crc change >> again to minimize read latency): >> >> CREATE TABLE global_user ( >> user_id BIGINT, >> app_id INT, >> type TEXT, >> name TEXT, >> last TIMESTAMP, >> paid BOOLEAN, >> values map<TIMESTAMP,FLOAT>, >> sku_time map<TEXT,TIMESTAMP>, >> extra_param map<TEXT,TEXT>, >> PRIMARY KEY (user_id, app_id, type, name) >> ) with >> compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'} >> and >> compaction={'class':'LeveledCompactionStrategy'} and >> compaction_strategy_options = {'sstable_size_in_mb':5} and >> gc_grace_seconds = 86400; >> > >