I went through the bug database and could not find any open ticket for having a configurable r-value in mapreduce. Is there one that someone knows of?
It would seem like this is a major limitation of the system. Currently MR works in a way that essentially results in an R-value of 1. That makes MR unreliable if you loose a node or add new nodes to your cluster. This is particularly painful, as MR is often used in lieu for a bulk fetch API, or when combined with Search or 2i to remove the additional round trip time that would be required without it. We'd like to double the size of our cluster, but without dumping all of the data and reloading it after we'd added the the new nodes, which would take far too long even with the new nodes (bulk load API anyone?), this does not seem feasible. It would result in 50% found found errors. Even adding a single node seems unacceptable. How are people handling this? Can one use Riak EDS to mirror the data to the new nodes set up as in a mirrored cluster, and once they are up to data, add them to the production cluster? Or is there a way to add a node to a cluster in such way that it accepts data for storage but not for querying, then have Riak EDS populate it, and then have it start accepting reads? Elias Levy
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com