Re: advice for EC2 deployment

aaron morton Wed, 27 Apr 2011 04:14:28 -0700

Using the EC2Snitch you could have one AZ in us-east-1 and one Az in us-west-1, 
treat each AZ as a single rack and each region as a DC. The network topology is 
rack aware so will prefer request that go to the same rack (not much of an 
issue when you have only one rack).


If possible I would use the same RF in each DC, if you want the fail over to be 
as clean as possible (see earlier comments about when number failed nodes in a 
dc). i.e. 3 replicas in each dc / region. 

Until you find a reason otherwise use LOCAL_QUORUM, if that proves to be too 
slow or you get more experience and feel comfortable with the trade offs then 
change to a lower CL.  

Dropping the CL level for write bursts does not make the cluster run any 
faster, it lets the client think the cluster is running faster and can result 
in the client overloading (in a good "this is what it should do" way) the 
cluster. This can result in more "eventual consistency" work to be done later 
during maintenance or read requests. If that is a reasonable trade off, you can 
write at CL ONE and read at CL ALL to ensure you get consistent reads (quorum 
is not good enough in that case).

Jump in and test it at Quorum, you may find the write performance is good 
enough. There are lots of dials to play with 
http://wiki.apache.org/cassandra/MemtableThresholds

Hope that helps. 
Aaron


On 27 Apr 2011, at 09:31, William Oberman wrote:

> I see what you're saying.  I was able to control write latency on mysql using 
> insert vs insert delayed (what I feel is MySQLs poor man's eventual 
> consistency option) + the fact that replication was a background asynchronous 
> process.  In terms of read latency, I was able to do up to a few hundred well 
> indexed mysql queries (across AZs) on a view while keeping the overall 
> latency of the page around or less than a second. 
> 
> I basically am replacing two use cases, the cases with difficult to scale 
> anticipated write volumes.  The first case was previously using insert 
> delayed (which I'm doing in cassandra as ONE) as I wasn't getting consistent 
> write/read operations before anyways.  The second case was using traditional 
> insert (which I was going to replace with some QUORUM-like level, I was 
> assuming LOCAL_QUORUM).  But, the latter case uses a write through memory 
> cache (memcache), so I don't know how often it really reads data from the 
> persistent store.  But I definitely need to make sure it is consistent.
> 
> In any case, it sounds like I'd be best served treating AZs as DCs, but then 
> I don't know what to make racks?  Or do racks not matter in a single AZ?  
> That way I can get an ack from a LOCAL_QUORUM read/write before the 
> (slightly) slower read/write to/from the other AZ (for redundancy).  Then I'm 
> only screwed if Amazon has a multi-AZ failure (so far, they've kept it to 
> "only" one!) :-)
> 
> will
> 
> On Tue, Apr 26, 2011 at 5:01 PM, aaron morton <aa...@thelastpickle.com> wrote:
> One difference between Cassandra and MySQL replication may be when the 
> network IO happens. Was the MySQL replication synchronous on transaction 
> commit ?  I was only aware that it had async replication, which means the 
> client is not exposed to the network latency. In cassandra the network 
> latency is exposed to the client as it needs to wait for the CL number of 
> nodes to respond. 
> 
> If you use the PropertyFilePartitioner with the NetworkTopology you can 
> manually assign machines to racks / dc's based on IP. 
> See conf/cassandra-topology.property file there is also an Ec2Snitch which 
> (from the code) 
> /**
>  * A snitch that assumes an EC2 region is a DC and an EC2 availability_zone
>  *  is a rack. This information is available in the config for the node.
> 
> Recent discussion on DC aware CL levels 
> http://www.mail-archive.com/user@cassandra.apache.org/msg11414.html
> 
> Hope that helps.
> Aaron
>  
> 
> On 27 Apr 2011, at 01:18, William Oberman wrote:
> 
>> Thanks Aaron!
>> 
>> Unless no one on this list uses EC2, there were a few minor troubles end of 
>> last week through the weekend which taught me a lot about obscure failure 
>> modes in various applications I use :-)  My original post was trying to be 
>> more redundant than fast, which has been by overall goal from even before 
>> moving to Cassandra (my downtime from the EC2 madness was minimal, and due 
>> to only having one single point of failure == the amazon load balancer).  My 
>> secondary goal was  trying to make moving to a second region easier, but is 
>> that is causing problems I can drop the idea.
>> 
>> I might be downplaying the cost of inter-AZ communication, but I've lived 
>> with that for quite some time, for example my current setup of MySQL in 
>> Master-Master replication is split over zones, and my webservers live in yet 
>> different zones.  Maybe Cassandra is "chattier" than I'm used to?  (again, 
>> I'm fairly new to cassandra)
>> 
>> Based on that article, the discussion, and the recent EC2 issues, it sounds 
>> like it would be better to start with:
>> -6 nodes split in two AZs 3/3
>> -Configure replication to do 2 in one AZ and one in the other 
>> (NetworkTopology treats AZs as racks, so does RF=3,us-east=3 make this 
>> happen naturally?)
>> -What does LOCAL_QUORUM do in this case?  Is there a "rack quorum"?  Or does 
>> the natural latencies of AZs make LOCAL_QUORUM behave like a rack quorum?
>> 
>> will
>> 
>> On Tue, Apr 26, 2011 at 1:14 AM, aaron morton <aa...@thelastpickle.com> 
>> wrote:
>> For background see this article:
>> http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
>> 
>> And this recent discussion 
>> http://www.mail-archive.com/user@cassandra.apache.org/msg12502.html
>> 
>> Issues that may be a concern:
>> - lots of cross AZ latency in us-east, e.g. LOCAL_QUORUM ops must wait cross 
>> AZ . Also consider it during maintenance tasks, how much of a pain is it 
>> going to be to have latency between every node.   
>> - IMHO not having sufficient (by that I mean 3) replicas in a cassandra DC 
>> to handle a single node failure when working at Quorum reduces the utility 
>> of the DC. e.g. with a local RF of 2 in the west, the quorum is 2, and if 
>> you lose one node from the replica set you will not be able to use local 
>> QUORUM for keys in that range. Or consider a failure mode where the west is 
>> disconnected from the east.
>> 
>> Could you start simple with 3 replicas in one AZ in us-east and 3 replicas 
>> in an AZ+Region ?  Then work through some failure scenarios.  
>> 
>> Hope that helps. 
>> Aaron
>>   
>> 
>> On 22 Apr 2011, at 03:28, William Oberman wrote:
>> 
>>> Hi,
>>> 
>>> My service is not yet ready to be fully multi-DC, due to how some of my 
>>> legacy MySQL stuff works.  But, I wanted to get cassandra going ASAP and 
>>> work towards multi-DC.  I have two main cassandra use cases: one where I 
>>> can handle eventual consistency (and all of the writes/reads are currently 
>>> ONE), and one where I can't (writes/reads are currently QUORUM).  My test 
>>> cluster is currently 4 smalls all in us-east with RF=3 (more to prove I can 
>>> clustering, than to have an exact production replica).  All of my unit 
>>> tests, and "load tests" (again, not to prove true max load, but to more to 
>>> tease out concurrency issues) are passing now.
>>> 
>>> For production, I was thinking of doing:
>>> -4 cassandra larges in us-east (where I am now), once in each AZ
>>> -1 cassandra large in us-west (where I have nothing)
>>> For now, my data can fit into a single large's 2 disk ephemeral using 
>>> RAID0, and I was then thinking of doing a RF=3 with us-east=2 and 
>>> us-west=1.  If I do eventual consistency at ONE, and consistency at 
>>> LOCAL_QUORUM, I was hoping:
>>> -eventual consistency ops would be really fast
>>> -consistent ops would be pretty fast (what does LOCAL_QUORUM do in this 
>>> case?  return after 1 or 2 us-east nodes ack?)
>>> -us-west would contain a complete copy of my data, so it's a good 
>>> eventually consistent "close to real time" backup  (assuming it can keep up 
>>> over long periods of time, but I think it should)
>>> -eventually, when I'm ready to roll out in us-west I'll be able to change 
>>> the replication settings and that server in us-west could help seed new 
>>> cassandra instances faster than the ones in us-east
>>> 
>>> Or am I missing something really fundamental about how cassandra works 
>>> making this a terrible plan?  I should have plenty of time to get my 
>>> multi-DC working before the instance in us-west fills up (but even then, I 
>>> should be able to add instances over there to stall fairly trivially, 
>>> right?).
>>> 
>>> Thanks!
>>> 
>>> will
>> 
>> 
>> 
>> 
>> -- 
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) ober...@civicscience.com
> 
> 
> 
> 
> -- 
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) ober...@civicscience.com

Re: advice for EC2 deployment

Reply via email to