Gordon,

The reason for the nondeterministic behavior is two-fold.

1. For performance reasons Search only ever reads from 1 node (R=1)

2. As an attempt to balance load and reduce vnode contention this node is
selected randomly

This is why it works 50% of the time.  Because now, for each index entry, 2
partitions have the data and 1 does not.  So depending on which one you hit
you'll get the data or not.   Furthermore, this behavior will continue until
you reindex because the index in Search has no form of anti-entropy such as
read repair or merkle trees.

In the future the easiest thing is to replace that lost node as quickly as
possible.  While it's down the other nodes will keep track of the new index
entries and will transfer them during data handoff when the node comes alive
again.  By removing the node you've changed the ring and your only option is
to reindex as you are already doing.  I realize that bringing that node up
or replacing it may not have been an option but this is the only way to
avoid this problem with Search as it stands today.

  I realize this sucks and isn't in line with Riak's more fault tolerant
behavior.  It does suck.  I hate the fact that I have to write this email
basically telling you this part of Search is broken, IMO.  I want to see it
addressed and I'm sure I'm not the only one.  Right now our internal ticket
board is buzzing in anticipation for the new release.  After that there is a
lot of love I want to give Search, this particular issue included.  I'd say
it's only a matter of time.


-Ryan

On Fri, Aug 19, 2011 at 2:46 PM, Gordon Tillman <gtill...@mezeo.com> wrote:

> Greetings all,
>
> After an extended datacenter power outage, a 3-node Riak cluster shut down.
>  When the power was restored, two of the three nodes came back up. Don't
> know what is going on with the third node.  But in the mean time, have
> removed the dead node from the ring.  The two remaining nodes show a good
> ringready status.
>
> The problem is that the search indexes appear to be in an inconsistent
> state.  For example, I can issue the same solr query on one of the nodes and
> 50% of the time it returns correct results.  The other times it returns an
> empty result set.
>
> I'm in the process of re-indexing the bucket in question (a very
> time-consuming affair).  But I wonder if anyone could shed some light on
> this situation as to why it occurred in the first place and if there is
> anything that can be done to keep this from happening again in the future.
>
> Many thanks,
>
> --gordon
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to