We use a replication factor such that if any one instance dies the cluster would remain alive. If a node dies, we simply replace it and move on. As far as disaster recovery, it's easy to store snapshots in S3, although glacier is looking interesting. Jared Biel System Administrator Bolder Thinking www.bolderthinking.com Office: 701.205.3153 jared.b...@bolderthinking.com
On 17 January 2013 13:44, Adam Venturella <aventure...@gmail.com> wrote: > Jared, how do you guys handle data backups for your ephemeral based cluster? > > I'm trying to move to ephemeral drives myself, and that was my last sticking > point; asking how others in the community deal with backup in case the VM > explodes. > > > > On Wed, Jan 16, 2013 at 1:21 PM, Jared Biel <jared.b...@bolderthinking.com> > wrote: >> >> We're currently using Cassandra on EC2 at very low scale (a 2 node >> cluster on m1.large instances in two regions.) I don't believe that >> EBS is recommended for performance reasons. Also, it's proven to be >> very unreliable in the past (most of the big/notable AWS outages were >> due to EBS issues.) We've moved 99% of our instances off of EBS. >> >> As other have said, if you require more space in the future it's easy >> to add more nodes to the cluster. I've found this page >> (http://www.ec2instances.info/) very useful in determining the amount >> of space each instance type has. Note that by default only one >> ephemeral drive is attached and you must specify all ephemeral drives >> that you want to use at launch time. Also, you can create a RAID 0 of >> all local disks to provide maximum speed and space. >> >> >> On 16 January 2013 20:42, Marcelo Elias Del Valle <mvall...@gmail.com> >> wrote: >> > Hello, >> > >> > I am currently using hadoop + cassandra at amazon AWS. Cassandra runs >> > on >> > EC2 and my hadoop process runs at EMR. For cassandra storage, I am using >> > local EC2 EBS disks. >> > My system is running fine for my tests, but to me it's not a good >> > setup >> > for production. I need my system to perform well for specially for >> > writes on >> > cassandra, but the amount of data could grow really big, taking several >> > Tb >> > of total storage. >> > My first guess was using S3 as a storage and I saw this can be done >> > by >> > using Cloudian package, but I wouldn't like to become dependent on a >> > pre-package solution and I found it's kind of expensive for more than >> > 100Tb: >> > http://www.cloudian.com/pricing.html >> > I saw some discussion at internet about using EBS or ephemeral disks >> > for >> > storage at Amazon too. >> > >> > My question is: does someone on this list have the same problem as >> > me? >> > What are you using as solution to Cassandra's storage when running it at >> > Amazon AWS? >> > >> > Any thoughts would be highly appreciatted. >> > >> > Best regards, >> > -- >> > Marcelo Elias Del Valle >> > http://mvalle.com - @mvallebr > >