We're in the process of migrating our webapps to Rails 3 (some from PHP to 
Rails 3 even). I'm the one responsible
for our new Amazon EC2 setup.

We've been looking at new ways of storing our data (quite alot of files but 
also data which was previously in a MySQL DB).

So, I've been trying out MongoDB for a while and quite liked it, though we've 
had some doubts on the security of our
data among other things. The ease of replication in Riak made me look at it 
closer. I really like how replication is in the
core of Riak and how data is just data regardless of what it consists of 
(files, json, xml whatever). I'm also looking at Riak
since we've discussed saving things to S3 but here Riak could be used instead I 
think.

One thing that is very important to us is the security of all communication 
within our cluster, so I'm wondering how the
riak nodes exchange data between each other - is that encrypted or could 
someone potentially listen in on it?

Could we easily encrypt it through stunnel (as we've done with some other 
protocols) and would that affect the performance
of Riak alot?

We're not entirely sure whether we want or need replication between EC2 US and 
EC2 Europe (and possibly Asia as
well) but if we do - would Riak work well over such distances?

Is Riak suited to run on XEN VMs like on EC2? What would be a recommended 
instance size (i.e does Riak require LOTS
of ram and processors?) Anyone have experience with Riak running on EC2?

So far, most of the files we store are somewhere between 500 k and 50 megs but 
might get larger than that as we
expand our business to many more clients with different needs. I know there's 
supposed to be a limit around 50 megs
currently... Is this being actively worked on?

Backups... well, we've done all kinds of 'em. The latest, and in my opinion 
most straightforward, is to put all data on an XFS
volume and freeze it (incl. "FLUSH TABLES WITH READ LOCK" on mysql) and then 
just do an EBS snapshot of the volume,
how would something like that apply to Riak (even though the data might be 
safer in a Riak cluster). If we run a cluster of say
10 nodes - how would we get back to a point in time? And if a DELETE was done 
erroneously, how could we get that data back?
Is riak-admin backup really an option (i.e backing up the entire cluster) if 
the dataset is very large? If we do restore using such
data would the whole cluster get "reset" to that point in time?

One of the things I'm quite excited about is how we could get away with nodes 
that are exactly alike and load-balancing
between them - each having a riak node running so we could have our app and the 
riak node on the same machine and
just add more such instances if we need to. Today we've separated clients onto 
different databases and machines (NOT load-balanced
and NOT replicated). This has worked ok but I think having a load-balanced 
cluster would make things easier and configuration
of machines simpler (we use chef so it's not THAT bad today). Now we add 
clients by adding a CNAME entry in DNS for a specific
machine/ip - adding configuration and databases using nanite but much of that 
could perhaps disappear with a single load-balanced
cluster I think...


Thanks for taking your time to look at my questions!

Kind regards,
John
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to