First of all, let's eliminate CouchDB. Its WOL filesystem is irrelevant
for this is generated data. If a document gets lost (for whatever
reason) - it can be regenerated. Topmost though, CouchDB can't run
dynamic queries. That's what the field indexes are for. MongoDB is
happy to store & query my data. The assumption is that "Riak Search"
will also be able to search it, once released. Is there a reason to
think MongoDB will be better than Riak for range search (indexed floats)?
My reasons for using Riak:
* will use it for other buckets anyway (hence reuse of skills &
infrastructure)
* would like to add the "distributed redundancy" as soon as I can afford
more servers for the #bigdata bucket
* can postpone search until end of this Q3 (as advertised) or Q4,
perhaps even longer
* the dataset will eventually expand to be very much bigger (real big)
and I rather not shard
I'm sure the "cannot tell riak where to place buckets" can be worked
around... I have some ideas, please tell me which are possible and
which are recommended:
1. I write directly to the big node (rather than through a
load-balancing proxy) - would it pass this data to other nodes for a
bucket of N=1?
2. I detach the big node from the cluster when writing, temporarily
falling back to a 2 node cluster and a solo node. This can happen
during (the relatively safer) western nighttime (which is my daytime)
and things will be monitored closely. Once the node joins back into the
cluster, would it have reasons to move any of the big bucket data to the
other small nodes?
3. Can I run 2 instances (nodes) of Riak on the big server? One for the
cluster of 3 nodes (little data). And one (single node cluster) for the
big data bucket.
4. I'd rather save $240 / year (one less Linode 512) -- but it's no big
deal to spend either. After I do the big upgrades (some day) - are
there tools to migrate the big bucket from the solo node cluster (into
the other cluster)?
I basically want to keep the big data isolated in any way possible. If
something goes terribly wrong (e.g. corrupt bucket or server explodes?)
- no harm done to me. I just boot a fresh instance / node form backup.
Whatever state the big data was (when last backed up) is always good
enough for this kind of data.
The order I listed these in - is my preferred order. Separate clusters
(#3 or #4) is last option, because (ideally) I'd like to be able to link
to the big bucket (though I probably shouldn't link until there's
redundancy for that data). But since I can't think of any good reasons
- maybe it's better to run separate clusters for starters, and merge
them in the future? Would Riak let me do #1 or #2? Or perhaps there is
a better way?
Orlin
Alexander Sicular wrote:
- You can not tell riak where to place buckets.
- You could set the N val on a bucket to one, and you should in the case of
your 'big bucket'. Otherwise you will get N replicas on the same physical host.
-Use linode. 512> 256 = better.
But in reality , your use case doesnt mesh well with what riak is all about.
distributed redundancy. I would use couchdb for your 'big bucket' of data.
Couch uses a write only log (wol) filesystem with an incremental b-tree index
for map reduce. This may work better for you.
-Alexander
On Aug 12, 2010, at 6:44 AM, Orlin Bozhinov wrote:
I can easily wait for Riak Search to do this
http://groups.google.com/group/mongodb-user/browse_thread/thread/c2563a8566591a30/b3d19f21675a899e
- instead of mongodb. Does this deployment I have in mind make sense:
I'll get a medium (or large) Linode box for the big dataset bucket. Hopefully you can
give me an idea about how much RAM I'll need for that. This is batch-generated data. It
takes time to generate, but (once added) it will not change. Because of that I'd like to
save some money and not replicate it. I plan to have 2 other small Linode servers and
run a Riak cluster of 3. Can I tell Riak to keep the big bucket exclusively on the big
server? It will be used only for queries. So if the server crashes, I can just reboot
it, expecting the same data back up. Because it's a single-node bucket (if that's even
possible to have in a cluster), I probably still won't be "linking" to it from
other buckets (so when it fails, the impact is minimal). Or maybe I should keep it in a
separate (single node) cluster anyway?
Cluster separation means I can run the smaller cluster elsewhere. The Joyent +
Riak news is very exciting! I couldn't afford to put the big bucket dataset on it
(another reason to have 2 clusters) and I'd have to go with the smallest
SmartMachines for starters. Would 256 MB RAM be good enough (just for Riak)?
What kind of load can that handle? I'm also tempted to just run everything on
Linode. It's about 3 times cheaper (as far memory goes) and the upgrades are less
dramatic. Would you recommend that (for low-budget)? I imagine there will be an
easy (Linode -> Joyent) Riak migration path...
Orlin
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com