On 01/05/2012 12:12 PM, Tim Robinson wrote:
Thank you for this info. I'm still somewhat confused.
Why would anyone ever want 2 copies on one physical PC? Correct me if
I am wrong, but part of the sales pitch for Riak is that the cost of
hardware is lessened by distributing your data across a cluster of
less expensive machines as opposed to having it all one reside on an
enormous server with very little redundancy.
The 2 copies of data on one physical PC provides no redundancy, but
increases hardware costs quite a bit.
Right?
Because in the case you expressed shock over, the pigeonhole
principle makes it *impossible* to store three copies of information in
two places without overlap. The alternative is lying to you about the
replica semantics. That would be bad.
In the second case I described, it's an artifact of a simplistic but
correct vnode sharding algorithm which uses the partion ID modulo node
count to assign the node for each partition. When N is not a multiple of
n, the last and the first (or second, etc, you do the math) partitions
can wind up on the same node. If you don't use even multiples of n/N,
the proportion of data that does overlap on one node is on the order of
1/64 to 1/1024 of the keyspace. This is not a significant operational cost.
This *does* reduce fault tolerance: losing those two "special" nodes
(but not two arbitrary nodes) can destroy those special keys even though
they were stored with N=3. As the probability of losing two *particular*
nodes simultaneously compares favorably with the probability of losing
*any three* nodes simultaneously, I haven't been that concerned over it.
It takes roughly six hours for me to allocate a new machine and restore
the destroyed node's backup to it. Anecdotally, I think you're more
likely to see *cluster* failure than *dual node* failure in a small
distributed system, but that's a long story.
The riak team has been aware of this since at least Jun 2010
(https://issues.basho.com/show_bug.cgi?id=228), and there are
operational workarounds involving target_n_val. As I understand it,
solving the key distribution problem is... nontrivial.
--Kyle
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com