Below is an output of my Riak cluster. 3 physical nodes. Ring size 128.
As far as I can tell when Riak installed fresh it is always place partitions
in the same way on a ring as long as number of vnodes and servers is the
same.

All presentations including "A Little Riak Book' show pretty picture of ring
and nodes claiming partitions in a  sequential fashion. That's clearly not a
case.
Output below shows that node2 is picked as favourite, which means replicas
of certain keys will definitely be on the same hardware. Partitions are
split 44 + 42 + 42. Why not 43+43+42?

Another thing, why the algorithm selects nodes in 'random' non-sequential
fashion? When the cluster gets created and nodes 2 & 3 are joined to node 1,
it's a clear situation. Partitions are empty so vnodes could be assigned in
a way so there's no consecutive partitions on the same hw.
My issue is that in my case if node2 goes down and I'm storing some data
with N=2 I will definitely not be able access certain keys and more
surprisingly all 2i will no longer work for the buckets with N=2 due to
{error,insufficient_vnodes_available}. That is all 2i's for those buckets.

I understand that when new nodes are attached Riak tries to avoid
reshuffling everything and just moves certain partitions, and at that point
you may end up with copies on the same physical nodes. But even then Riak
should make best effort and try not to put consecutive partitions on the
same server. If it has to move it anyway it could as well put it on any
other machine but the one that holds partition with preceding and following
index.
I also understand Riak does not guarantee that replicas are on distinct
servers (why? it should, at least for N=2 and N=3 if possible)

I appreciate minimum recommended setup is 5 nodes and I should be storing
with N=3 minimum. 
But I just find it confusing when presentations show something that is not
even remotely close to reality.

Just to be clear I have nothing against Riak, I think it's great though bit
disappointing that there are no stronger conditions about replica placement
here.

I'm probably missing something and simplifying too much. Any clarification
appreciated.

Daniel 


riak@10.173.240.1)2> 
(riak@10.173.240.1)2> {ok, Ring} = riak_core_ring_manager:get_my_ring().
{ok,
 {chstate_v2,'riak@10.173.240.1',
  [{'riak@10.173.240.1',{303,63561952927}},
   {'riak@10.173.240.2',{31,63561952907}},
   {'riak@10.173.240.3',{25,63561952907}}],
  {128,
   [{0,'riak@10.173.240.1'},
    {11417981541647679048466287755595961091061972992,
     'riak@10.173.240.2'},
    {22835963083295358096932575511191922182123945984,
     'riak@10.173.240.2'},
    {34253944624943037145398863266787883273185918976,
     'riak@10.173.240.3'},
    {45671926166590716193865151022383844364247891968,
     'riak@10.173.240.1'},
    {57089907708238395242331438777979805455309864960,
     'riak@10.173.240.2'},
    {68507889249886074290797726533575766546371837952,
     'riak@10.173.240.2'},
    {79925870791533753339264014289171727637433810944,
     'riak@10.173.240.3'},
    {91343852333181432387730302044767688728495783936,
     'riak@10.173.240.1'},
    {102761833874829111436196589800363649819557756928,
     'riak@10.173.240.2'},
    {114179815416476790484662877555959610910619729920,
     'riak@10.173.240.2'},
    {125597796958124469533129165311555572001681702912,
     'riak@10.173.240.3'},
    {137015778499772148581595453067151533092743675904,
     'riak@10.173.240.1'},
    {148433760041419827630061740822747494183805648896,
     'riak@10.173.240.2'},
    {159851741583067506678528028578343455274867621888,
     'riak@10.173.240.2'},
    {171269723124715185726994316333939416365929594880,
     'riak@10.173.240.3'},
    {182687704666362864775460604089535377456991567872,
     'riak@10.173.240.1'},
    {194105686208010543823926891845131338548053540864,
     'riak@10.173.240.2'},
    {205523667749658222872393179600727299639115513856,
     'riak@10.173.240.2'},
    {216941649291305901920859467356323260730177486848,

and so on



--
View this message in context: 
http://riak-users.197444.n3.nabble.com/Partitions-placement-tp4030664.html
Sent from the Riak Users mailing list archive at Nabble.com.

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to