Hi Andrew,

Distribution in Riak is based on the concept of the vnode (virtual node). The vnode is the unit of measure in reference to "N" replicas. At the time of cluster creation there is some upper limit of vnodes in a cluster. The number of vnodes a pnode (physical node) owns is based on how many pnodes are in the cluster.

Here are some quick points re distribution/replication (which are different concepts) in Riak as it stands today:

-64 vnodes default. Must be considered when first creating a cluster.

-a simple rule of thumb is minimum 10 vnodes per pnode with number of vnodes being a power of 2 (64, 128, 256, ...)

-homogenious vnode distribution against pnodes. The original dynamo paper calls for a heterogenious distribution based on relative performance of pnodes. I believe there are some open tickets on this.

-homogeneous pnodes. Any pnode can answer any request. If the pnode does not own the data in it's share of vnodes it will fetch data from one that does and then return.

-"N" replicas does not guarantee N copies on N pnodes. It guarantees you copies on N-1 pnodes. That is one of the reasons the minimum recommended setup is a cluster of 3 pnodes.

-there is no "eventual propogation" there is "eventual consistency" (the "C" in CAP theorem). Once you issue a write command it gets written as fast as possible to all the vnodes that it should be written to save network partitions (the "P" in CAP) in which case there are fallback vnodes.

In order to accomplish what I think you want to accomplish you would have to hash the key in the same manner that Riak does, then translate that hash to it's vnode in the keyspace, then map that vnode to a pnode, then open a connection to that pnode and get your data. Having said that, I think it is quite possible by becoming quite familiar with riak_core (the module that governs hashing (I believe), vnodes, handoffs and gossip) and either wrapping or extending it to taste.

On the other hand bare in mind that Riak, as it is configured by default, is a disk store (although you could configure memory only backends - which I highly discourage). Ultimately your latency will be i/o bound. You may do better to shard Redis if ultra low latency is your primary concern. A possible approach could be replicated clusters across sites. Bitcask is basically a WOL (write only log) that could theoretically be realtime replicated over a SAN.

Additionaly, open source Riak does not support multiple site clusters. Enterprise Riak does and may have what you need re multisite but I doubt it does re a guarantee as to having a physical copy of data in each site. Currently the only benefit to having your data on more vnodes is for the distributed map phase of a map/reduce query which, I have no doubt, will reach a point of diminishing returns (interesting research project though).

Further, in regards to systems of Dynamo pedegree it should be noted that Dynamo is principally concerned with maximally available systems (the "A" in CAP) with regards to write availability. Conflicts are not resolved at the persistence layer but rather at the application layer (under certain configurations).

Good luck,
Alexander

@siculars on twitter
http://siculars.posterous.com

Sent from my iPhone

On Oct 4, 2010, at 22:35, Andrew Cooper <andrew.stephen.coo...@gmail.com> wrote:

Hello everyone,

The Dynamo replication scheme allows one to specify how many nodes a
change should be propagated to. But what if I want a change to be
(eventually) propagated to each and every machine in the cluster?
I can't simply specify a fixed number of nodes to replicate to,
because, in the general case, this is time-varying. My use case is a
multi-datacenter cluster with a low read latency requirement for
certain records (thus those records need to be present in each
datacenter, if not even on each machine).

Is this currently possible, or would it require a patch? Or would this
be too difficult to implement with the current architecture? This
seems to be a problem common to all databases using the Dynamo
replication scheme... or is it? Thoughts?

Thanks,
Andrew

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to