On 4 January 2017 at 23:22, Tomi Takussaari <tomi.takussa...@gmail.com> wrote:
> Hello Riak-users > > We have 9 node Riak-cluster, that we use to store user accounts. > > Some of the crucial data fields of user account are indexed using I2, so > that we can do secondary index queries based on them. > > Today, we tested how our cluster performs when few nodes go down, and > results were not very good. > > If more than 2 nodes go down, all I2 queries will start failing, returning > HTTP 500, with "insufficient vnodes available" error. After nodes are up > again, things start working again. > Normal object CRUD operations worked fine. > > Is this to be expected behaviour ? > > Funny thing is, that we have other cluster, with same configuration but > with 6 nodes, for other environment, and that also experiences same > problems when more than 2 nodes go down, so it does not seem to have > anything to do with percentage of nodes being down.. > > Our ring size is 256, and current Riak version is 2.2. > > Both clusters were first created years ago, with Riak 1.4, if memory > serves, and I believe we tested this same thing back then, and I2 queries > did not stop working this easily then.. > > Any help would be appreciated! > > Hi Tomi, For a cluster that uses the default replication factor (`n_val`) of 3, the behaviour you observed is expected. Secondary index queries work on a covering set of VNodes, that include 1 replica for each KV object. With `n_val=3` the covering set can only be guaranteed if no more than 2 nodes are offline at any given time. This behaviour has not changed since the 1.4 release. As our documentation [0] states: "Riak stores 3 replicas of all objects by default, although this can be changed using bucket types, which manage buckets’ replication properties. The system is capable of generating a full set of results from one third of the system’s partitions as long as it chooses the right set of partitions. The query is sent to each partition, the index data is read, and a list of keys is generated and then sent back to the requesting node." Other operations working fine is due to Riak's ability to spin up fallback partitions (VNodes on one of the remaining nodes), that will accept and temporarily store data while the Node that should own the data is down. Kind Regards, Magnus [0]: http://docs.basho.com/riak/kv/2.2.0/using/reference/secondary-indexes/#how-it-works -- Magnus Kessler Client Services Engineer Basho Technologies Limited Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com