Re: row cache re-fill very slow

aaron morton Tue, 20 Nov 2012 17:04:23 -0800

> INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 
> 451) completed loading (5175655 ms; 13259976 keys) row cache
So it was reading 2,562 rows per second during startup. I'd say that's not 
unreasonable performance for 13 million rows. It will get faster in 1.2, but 
for now just have the cache save less keys perhaps.


> Would something like iterating over SSTables instead, and throwing rows at 
> the cache that need to be in there feasible ? 
During start up we do not read the -Data.db component of the SStable, only the 
-Index.db (and -Filter.db) component. Also the SSTables are opened in parallel. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/11/2012, at 10:39 AM, Andras Szerdahelyi 
<andras.szerdahe...@ignitionone.com> wrote:

> Aaron,
> 
>> What version are you on ? 
> 
> 
> 1.1.5 
> 
>> Do you know how many rows were loaded ?
> 
> INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 
> 451) completed loading (5175655 ms; 13259976 keys) row cache
> 
>> In both cases I do not believe the cache is stored in token (or key) order. 
> 
> Am i getting this right:  the row keys are read and rows are retrieved from 
> SSTables in the order their keys are in the cache file..
> Would something like iterating over SSTables instead, and throwing rows at 
> the cache that need to be in there feasible ? If the SSTables themselves are 
> written sequentially at compaction time , which is how i remember they are 
> written, SSTable-sized sequential reads with a filter ( bloom filter for the 
> row cache? :-) ) must be faster than reading from all across the column 
> family ( i have HDDs and about 1k SSTables )
> 
>> row_cache_keys_to_save in yaml may help you find a happy half way point. 
> 
> 
> If i can keep that high enough, with my data retention requirements, save for 
> the absolute first get on a row, i can operate entirely out of memory.
> 
> thanks!
> Andras
> 
> Andras Szerdahelyi
> Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
> M: +32 493 05 50 88 | Skype: sandrew84
> 
> 
> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>
> 
> 
> On 19 Nov 2012, at 22:00, aaron morton <aa...@thelastpickle.com>
>  wrote:
> 
>>> i was just wondering if anyone else is experiencing very slow ( ~ 3.5 
>>> MB/sec ) re-fill of the row cache at start up.
>> It was mentioned the other day.  
>> 
>> What version are you on ? 
>> Do you know how many rows were loaded ? When complete it will log a message 
>> with the pattern 
>> 
>> "completed loading (%d ms; %d keys) row cache for %s.%s"
>> 
>>> How is the "saved row cache file" processed?
>> 
>> In Version 1.1, after the SSTables have been opened the keys in the saved 
>> row cache are read one at a time and the whole row read into memory. This is 
>> a single threaded operation. 
>> 
>> In 1.2 reading the saved cache is still single threaded, but reading the 
>> rows goes through the read thread pool so is in parallel.
>> 
>> In both cases I do not believe the cache is stored in token (or key) order. 
>> 
>>> ( Admittedly whatever is going on is still much more preferable to starting 
>>> with a cold row cache )
>> 
>> row_cache_keys_to_save in yaml may help you find a happy half way point. 
>> 
>> Cheers
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 20/11/2012, at 3:17 AM, Andras Szerdahelyi 
>> <andras.szerdahe...@ignitionone.com> wrote:
>> 
>>> Hey list,
>>> 
>>> i was just wondering if anyone else is experiencing very slow ( ~ 3.5 
>>> MB/sec ) re-fill of the row cache at start up. We operate with a large row 
>>> cache ( 10-15GB currently ) and we already measure startup times in hours 
>>> :-)
>>> 
>>> How is the "saved row cache file" processed? Are the cached row keys simply 
>>> iterated over and their respective rows read from SSTables - possibly 
>>> creating random reads with small enough sstable files, if the keys were not 
>>> stored in a manner optimised for a quick re-fill ? -  or is there a smarter 
>>> algorithm ( i.e. scan through one sstable at a time, filter rows that 
>>> should be in row cache )  at work and this operation is purely disk i/o 
>>> bound ?
>>> 
>>> ( Admittedly whatever is going on is still much more preferable to starting 
>>> with a cold row cache )
>>> 
>>> thanks!
>>> Andras
>>> 
>>> 
>>> 
>>> Andras Szerdahelyi
>>> Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
>>> M: +32 493 05 50 88 | Skype: sandrew84
>>> 
>>> 
>>> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>
>>> 
>>> 
>> 
>

Re: row cache re-fill very slow

Reply via email to