> > So, I've been trying out MongoDB for a while and quite liked it, though we've > had some doubts on the security of our > data among other things. The ease of replication in Riak made me look at it > closer. I really like how replication is in the > core of Riak and how data is just data regardless of what it consists of > (files, json, xml whatever). I'm also looking at Riak > since we've discussed saving things to S3 but here Riak could be used instead > I think. >
A number of Basho customers and open-source users alike are using Riak to store files. You might find the webinar I gave several weeks ago about Riak and Rails helpful -- it includes an example of how to store uploaded files in Riak and do internal redirects in nginx to serve the files directly from Riak. > One thing that is very important to us is the security of all communication > within our cluster, so I'm wondering how the > riak nodes exchange data between each other - is that encrypted or could > someone potentially listen in on it? > > Could we easily encrypt it through stunnel (as we've done with some other > protocols) and would that affect the performance > of Riak alot? > For one customer we considered stunnel for WAN communication but in the end decided to set up IPSec, which can be managed by the kernel. If you don't, you'll have to keep those stunnel processes running. Also, Riak uses some dynamic ports to do inter-node communication, so you'd have to take all of those into account. There are also some hacks to Erlang that enable SSL for inter-node communication, but we don't build that into Riak. > We're not entirely sure whether we want or need replication between EC2 US > and EC2 Europe (and possibly Asia as > well) but if we do - would Riak work well over such distances? > The EnterpriseDS multi-site replication feature is specifically for this type of setup and is designed to work over connections with greater latencies. > Is Riak suited to run on XEN VMs like on EC2? What would be a recommended > instance size (i.e does Riak require LOTS > of ram and processors?) Anyone have experience with Riak running on EC2? > Riak doesn't require lots of RAM or CPU, but is mostly I/O bound (RAM usage is dependent on data quantity and chosen backend). Disk I/O is where you'll feel the most pain using Riak on EC2, primarily because of unpredictable latencies. > So far, most of the files we store are somewhere between 500 k and 50 megs > but might get larger than that as we > expand our business to many more clients with different needs. I know there's > supposed to be a limit around 50 megs > currently... Is this being actively worked on? > Last I checked, our per-object limit was 64MB. However, there is a not-quite-releaseable add-on that treats Riak like a block-oriented filesystem for storing larger objects. We'll announce it when it comes closer to release. > Backups... well, we've done all kinds of 'em. The latest, and in my opinion > most straightforward, is to put all data on an XFS > volume and freeze it (incl. "FLUSH TABLES WITH READ LOCK" on mysql) and then > just do an EBS snapshot of the volume, > how would something like that apply to Riak (even though the data might be > safer in a Riak cluster). If we run a cluster of say > 10 nodes - how would we get back to a point in time? And if a DELETE was done > erroneously, how could we get that data back? > Is riak-admin backup really an option (i.e backing up the entire cluster) if > the dataset is very large? If we do restore using such > data would the whole cluster get "reset" to that point in time? > When using the bitcask backend, your backup can be as simple as copying the directory, as its files are immutable. An EBS snapshot would essentially do the same thing. As far as restoring erroneous DELETEs, if not all nodes have reaped the value, it will still exist. Recovering it after a reap will be much more difficult. > One of the things I'm quite excited about is how we could get away with nodes > that are exactly alike and load-balancing > between them - each having a riak node running so we could have our app and > the riak node on the same machine and > just add more such instances if we need to. Today we've separated clients > onto different databases and machines (NOT load-balanced > and NOT replicated). This has worked ok but I think having a load-balanced > cluster would make things easier and configuration > of machines simpler (we use chef so it's not THAT bad today). Now we add > clients by adding a CNAME entry in DNS for a specific > machine/ip - adding configuration and databases using nanite but much of that > could perhaps disappear with a single load-balanced > cluster I think... > One customer described their setup as a "Christmas tree" - that is, the application with a load-balancer in front of it, backed by a Riak cluster with a load-balancer in front of that. Of course, you can grow your Riak cluster in lock-step with your application, but you might find that your application will need more or fewer servers than your Riak cluster, so coupling them in that way might not be a good idea. Sean Cribbs <s...@basho.com> Developer Advocate Basho Technologies, Inc. http://basho.com/ _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com