Re: SSTable size versus read performance

Edward Capriolo Thu, 16 May 2013 07:48:06 -0700

With you use compression you should play with your block size. I believe
the default may be 32K but I had more success with 8K, nearly same
compression ratio, less young gen memory pressure.



On Thu, May 16, 2013 at 10:42 AM, Keith Wright <kwri...@nanigans.com> wrote:

> The biggest reason I'm using compression here is that my data lends itself
> well to it due to the composite columns.  My current compression ratio is
> 30.5%.  Not sure it matters but my BF false positive ration os 0.048.
>
> From: Edward Capriolo <edlinuxg...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Thursday, May 16, 2013 10:23 AM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: SSTable size versus read performance
>
> I am not sure of the new default is to use compression, but I do not
> believe compression is a good default. I find compression is better for
> larger column families that are sparsely read. For high throughput CF's I
> feel that decompressing larger blocks hurts performance more then
> compression adds.
>
>
> On Thu, May 16, 2013 at 10:14 AM, Keith Wright <kwri...@nanigans.com>wrote:
>
>> Hi all,
>>
>>     I currently have 2 clusters, one running on 1.1.10 using CQL2 and one
>> running on 1.2.4 using CQL3 and Vnodes.   The machines in the 1.2.4 cluster
>> are expected to have better IO performance as we are going from 1 SSD data
>> disk per node in the 1.1 cluster to 3 SSD data disks per node in the 1.2
>> cluster with higher end drives (commit logs are on their own disk shared
>> with the OS).  I am doing some stress testing on the 1.2 cluster and have
>> found that although the reads / sec as seen from iostat are approximately
>> the same (3K / sec) in both clusters, the MB/s read in the new cluster is
>> MUCH higher (7 MB/s in 1.1 as compared to 30-50 MB/s in 1.2).  As a result,
>> I am seeing excessive iowait in the 1.2 cluster causing high average read
>> times of 30 ms under the same load (1.1 cluster sees around 5 ms).  They
>> are both using Leveled compaction but one thing I did change in the new
>> cluster was to increase the sstable size from the OOTB setting to 32 MB.
>>  Note that my reads are by definition highly random as we are running
>> memcached in front for various reasons.  Does cassandra need to read the
>> entire SSTable when fetching a row or only the relevant chunk (I have the
>> OOTB chunk size and BF settings)?  I just decreased the sstable size to 5
>> MB and am waiting for compactions to complete to see if that makes a
>> difference.
>>
>> Thanks!
>>
>> Relevant table definition if helpful (note that I also changed to the LZ4
>> compressor expecting better read performance and I decreased the crc change
>> again to minimize read latency):
>>
>> CREATE TABLE global_user (
>> user_id BIGINT,
>> app_id INT,
>> type TEXT,
>> name TEXT,
>> last TIMESTAMP,
>> paid BOOLEAN,
>> values map<TIMESTAMP,FLOAT>,
>> sku_time map<TEXT,TIMESTAMP>,
>> extra_param map<TEXT,TEXT>,
>> PRIMARY KEY (user_id, app_id, type, name)
>> ) with 
>> compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'}
>> and
>> compaction={'class':'LeveledCompactionStrategy'} and
>> compaction_strategy_options = {'sstable_size_in_mb':5} and
>> gc_grace_seconds = 86400;
>>
>
>

Re: SSTable size versus read performance

Reply via email to