Agree. Cassandra generally assumes a reasonable static cluster membership. 
There are some tricks that can be done with copying SSTables but they will only 
reduce the need to stream data around, not eliminate it.

This may not suit your problem domain but, speaking of the AWS infrastructure 
how about using the SQS messaging service (or similar e.g. RabbitMQ) to smooth 
out your throughput ? You could then throttle the inserts into the cassandra 
cluster to a maximum level and spec your HW against that. During peak the 
message queue can soak up the overflow. 

Hope that helps. 
Aaron

On 4/03/2011, at 2:07 PM, Dan Hendry wrote:

> To some extent, the boot-strapping problem will be an issue with most
> solutions: the data has to be duplicated from somewhere. Bootstrapping
> should not cause much performance degradation unless you are already pushing
> capacity limits. It's the decommissioning problem which makes Cassandra
> somewhat problematic in your case. You grow your cluster x5 then write to
> it. You have to perform a proper decommission when shrinking the cluster
> again which involves validating and streaming data to the remaining
> replicas: a fairly serious operation with TBs of data. For most realistic
> situations, unless the cluster is completely read-only, you cant just kill
> most of the nodes in the cluster.
> 
> I cant really think of a good, general, way to do this with just Cassandra
> although there may be some hacktastical possibilities. I think a more
> statically sized Cassandra cluster then a variable cache layer (memcached or
> similar) is probably a better solution. This option kind of falls apart at
> the terabytes of data range. 
> 
> Have you considered using S3, Amazon cloud front or some other CDN instead
> of rolling your own solution? For immutable data, its what they excel at.
> Cassandra has amazing write capacity and its design focus is on scaling
> writes. I would not really consider it a good tool for the job of serving
> massive amounts of static content.
> 
> Dan
> 
> -----Original Message-----
> From: Shaun Cutts [mailto:sh...@cuttshome.net] 
> Sent: March-03-11 13:00
> To: user@cassandra.apache.org
> Subject: question about replicas & dynamic response to load
> 
> Hello,
> 
> In our project our usage pattern is likely to be quite variable -- high for
> a a few days, then lower, etc could vary as much (or more) as 10x from peak
> to "non-peak". Also, much of our data is immutable -- but there is a
> considerable amount of it -- perhaps in the single digit TBs. Finally, we
> are hosting with amazon.
> 
> I'm looking for advice on how to vary the number of nodes dynamically, in
> order to reduce our hosting costs at non-peak times. I worry that just
> adding "new" nodes in response to demand will make things worse -- at least
> temporarily -- as the new node copies data to itself; then bringing it down
> will also cause a degradation.
> 
> I'm wondering if it is possible to bring up exact copies of other nodes? Or
> alternately to take down a populated node containing (only?) immutable data,
> then bring it up again when the need arises?
> 
> Are there reference/reading materials(/blogs) concerning dynamically varying
> number of nodes in response to demand?
> 
> Thanks!
> 
> -- Shaun
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 9.0.872 / Virus Database: 271.1.1/3479 - Release Date: 03/03/11
> 02:34:00
> 

Reply via email to