Re: riak deployment

Orlin Bozhinov Fri, 13 Aug 2010 11:45:59 -0700

First of all, let's eliminate CouchDB. Its WOL filesystem is irrelevantfor this is generated data. If a document gets lost (for whateverreason) - it can be regenerated. Topmost though, CouchDB can't rundynamic queries. That's what the field indexes are for. MongoDB ishappy to store & query my data. The assumption is that "Riak Search"will also be able to search it, once released. Is there a reason tothink MongoDB will be better than Riak for range search (indexed floats)?


My reasons for using Riak:

* will use it for other buckets anyway (hence reuse of skills &infrastructure)* would like to add the "distributed redundancy" as soon as I can affordmore servers for the #bigdata bucket* can postpone search until end of this Q3 (as advertised) or Q4,perhaps even longer* the dataset will eventually expand to be very much bigger (real big)and I rather not shard

I'm sure the "cannot tell riak where to place buckets" can be workedaround... I have some ideas, please tell me which are possible andwhich are recommended:

1. I write directly to the big node (rather than through aload-balancing proxy) - would it pass this data to other nodes for abucket of N=1?2. I detach the big node from the cluster when writing, temporarilyfalling back to a 2 node cluster and a solo node. This can happenduring (the relatively safer) western nighttime (which is my daytime)and things will be monitored closely. Once the node joins back into thecluster, would it have reasons to move any of the big bucket data to theother small nodes?3. Can I run 2 instances (nodes) of Riak on the big server? One for thecluster of 3 nodes (little data). And one (single node cluster) for thebig data bucket.4. I'd rather save $240 / year (one less Linode 512) -- but it's no bigdeal to spend either. After I do the big upgrades (some day) - arethere tools to migrate the big bucket from the solo node cluster (intothe other cluster)?

I basically want to keep the big data isolated in any way possible. Ifsomething goes terribly wrong (e.g. corrupt bucket or server explodes?)- no harm done to me. I just boot a fresh instance / node form backup.Whatever state the big data was (when last backed up) is always goodenough for this kind of data.

The order I listed these in - is my preferred order. Separate clusters(#3 or #4) is last option, because (ideally) I'd like to be able to linkto the big bucket (though I probably shouldn't link until there'sredundancy for that data). But since I can't think of any good reasons- maybe it's better to run separate clusters for starters, and mergethem in the future? Would Riak let me do #1 or #2? Or perhaps there isa better way?


Orlin


Alexander Sicular wrote:

- You can not tell riak where to place buckets.
- You could set the N val on a bucket to one, and you should in the case of 
your 'big bucket'. Otherwise you will get N replicas on the same physical host.
-Use linode. 512>  256 = better.

But in reality , your use case doesnt mesh well with what riak is all about. 
distributed redundancy. I would use couchdb for your 'big bucket' of data. 
Couch uses a write only log (wol) filesystem with an incremental b-tree index 
for map reduce. This may work better for you.

-Alexander

On Aug 12, 2010, at 6:44 AM, Orlin Bozhinov wrote:

I can easily wait for Riak Search to do this
http://groups.google.com/group/mongodb-user/browse_thread/thread/c2563a8566591a30/b3d19f21675a899e
- instead of mongodb. Does this deployment I have in mind make sense:

I'll get a medium (or large) Linode box for the big dataset bucket. Hopefully you can
give me an idea about how much RAM I'll need for that. This is batch-generated data. It
takes time to generate, but (once added) it will not change. Because of that I'd like to
save some money and not replicate it. I plan to have 2 other small Linode servers and
run a Riak cluster of 3. Can I tell Riak to keep the big bucket exclusively on the big
server? It will be used only for queries. So if the server crashes, I can just reboot
it, expecting the same data back up. Because it's a single-node bucket (if that's even
possible to have in a cluster), I probably still won't be "linking" to it from
other buckets (so when it fails, the impact is minimal). Or maybe I should keep it in a
separate (single node) cluster anyway?

Cluster separation means I can run the smaller cluster elsewhere. The Joyent +
Riak news is very exciting! I couldn't afford to put the big bucket dataset on it
(another reason to have 2 clusters) and I'd have to go with the smallest
SmartMachines for starters. Would 256 MB RAM be good enough (just for Riak)?
What kind of load can that handle? I'm also tempted to just run everything on
Linode. It's about 3 times cheaper (as far memory goes) and the upgrades are less
dramatic. Would you recommend that (for low-budget)? I imagine there will be an
easy (Linode -> Joyent) Riak migration path...

Orlin

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak deployment

Reply via email to