Re: row cache re-fill very slow

Andras Szerdahelyi Mon, 19 Nov 2012 13:40:17 -0800

Aaron,

What version are you on ?


1.1.5

Do you know how many rows were loaded ?

INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 
451) completed loading (5175655 ms; 13259976 keys) row cache

In both cases I do not believe the cache is stored in token (or key) order.

Am i getting this right:  the row keys are read and rows are retrieved from 
SSTables in the order their keys are in the cache file..
Would something like iterating over SSTables instead, and throwing rows at the 
cache that need to be in there feasible ? If the SSTables themselves are 
written sequentially at compaction time , which is how i remember they are 
written, SSTable-sized sequential reads with a filter ( bloom filter for the 
row cache? :-) ) must be faster than reading from all across the column family 
( i have HDDs and about 1k SSTables )

row_cache_keys_to_save in yaml may help you find a happy half way point.

If i can keep that high enough, with my data retention requirements, save for 
the absolute first get on a row, i can operate entirely out of memory.

thanks!
Andras

Andras Szerdahelyi
Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: sandrew84


[cid:7BDF7228-D831-4D98-967A-BE04FEB17544]




On 19 Nov 2012, at 22:00, aaron morton 
<aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>
 wrote:

i was just wondering if anyone else is experiencing very slow ( ~ 3.5 MB/sec ) 
re-fill of the row cache at start up.
It was mentioned the other day.

What version are you on ?
Do you know how many rows were loaded ? When complete it will log a message 
with the pattern

"completed loading (%d ms; %d keys) row cache for %s.%s"

How is the "saved row cache file" processed?

In Version 1.1, after the SSTables have been opened the keys in the saved row 
cache are read one at a time and the whole row read into memory. This is a 
single threaded operation.

In 1.2 reading the saved cache is still single threaded, but reading the rows 
goes through the read thread pool so is in parallel.

In both cases I do not believe the cache is stored in token (or key) order.

( Admittedly whatever is going on is still much more preferable to starting 
with a cold row cache )
row_cache_keys_to_save in yaml may help you find a happy half way point.

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 20/11/2012, at 3:17 AM, Andras Szerdahelyi 
<andras.szerdahe...@ignitionone.com<mailto:andras.szerdahe...@ignitionone.com>> 
wrote:

Hey list,

i was just wondering if anyone else is experiencing very slow ( ~ 3.5 MB/sec ) 
re-fill of the row cache at start up. We operate with a large row cache ( 
10-15GB currently ) and we already measure startup times in hours :-)

How is the "saved row cache file" processed? Are the cached row keys simply 
iterated over and their respective rows read from SSTables - possibly creating 
random reads with small enough sstable files, if the keys were not stored in a 
manner optimised for a quick re-fill ? -  or is there a smarter algorithm ( 
i.e. scan through one sstable at a time, filter rows that should be in row 
cache )  at work and this operation is purely disk i/o bound ?

( Admittedly whatever is going on is still much more preferable to starting 
with a cold row cache )

thanks!
Andras



Andras Szerdahelyi
Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: sandrew84


<C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>

<<inline: C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>>

Re: row cache re-fill very slow

Reply via email to