Re: single row key continues to grow, should I be concerned?

aaron morton Tue, 20 Mar 2012 10:38:16 -0700

> The reads are only fetching slices of 20 to 100 columns max at a time from 
> the row but if the key is planted on one node in the cluster I am concerned 
> about that node getting the brunt of traffic.
What RF are you using, how many nodes are in the cluster, what CL do you read 
at ?


If you have lots of nodes that are in different racks the 
NetworkTopologyStrategy will do a better job of distributing read load than the 
SimpleStrategy. The DynamicSnitch can also result distribute load, see 
cassandra yaml for it's configuration. 

> I thought about breaking the column data into multiple different row keys to 
> help distribute throughout the cluster but its so darn handy having all the 
> columns in one key!!
If you have a row that will continually grow it is a good idea to partition it 
in some way. Large rows can slow things like compaction and repair down. If you 
have something above 60MB it's starting to slow things down. Can you partition 
by a date range such as month ?

Large rows are also a little slower to query from
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

If most reads are only pulling 20 to 100 columns at a time are there two 
workloads ? Is it possible store just these columns in a separate row ? If you 
understand how big a row may get may be able to use the row cache to improve 
performance.  

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/03/2012, at 2:05 PM, Blake Starkenburg wrote:

> I have a row key which is now up to 125,000 columns (and anticipated to 
> grow), I know this is a far-cry from the 2-billion columns a single row key 
> can store in Cassandra but my concern is the amount of reads that this 
> specific row key may get compared to other row keys. This particular row key 
> houses column data associated with one of the more popular areas of the site. 
> The reads are only fetching slices of 20 to 100 columns max at a time from 
> the row but if the key is planted on one node in the cluster I am concerned 
> about that node getting the brunt of traffic.
> 
> I thought about breaking the column data into multiple different row keys to 
> help distribute throughout the cluster but its so darn handy having all the 
> columns in one key!!
> 
> key_cache is enabled but row cache is disabled on the column family.
> 
> Should I be concerned going forward? Any particular advice on large wide rows?
> 
> Thanks!

Re: single row key continues to grow, should I be concerned?

Reply via email to