Hi Douglas, That seems to be a good candidate for an explanation. Thank you very much for the explanation and link. I'll dig into it.
As promised, I looked into whether we in the second case I mentioned also had "unrecognized message" in the logs, and we indeed had. On Tue, Apr 18, 2017 at 2:55 PM, Douglas Rohrer <droh...@basho.com> wrote: > This sounds like an issue our Riak CS team ran into quite a while ago, which > involved “slow nodes” and coordination retry. Take a look at > https://github.com/basho/riak_kv/issues/1188 and see if it makes sense to > you, but it certainly sounds like what’s happening. > > The basic flow of the issue comes when one node in the preflist is down, and > you write to a node _not in the preflist_, at which point the following > happens (better formatted in the issue above, btw): > > client node-A node-R node-S > ---(Put)--> > Compute PL > = P, Q and R > Redirect to R ---> [frozen] > | > | 3 sec timeout > V > Compute new PL excluding R > = P, Q and S > Redirect to S --------------------> Compute PL without > | any knowlege about R (at > this point) > | = P, Q and R > | Redirect to R ---+ > | | | > | [what happnes?] <-|-----------------+ > | | 3 sec timeout > | V > | Compute new PL excluding R > | = P, Q and S > | I'm coordinator this time > | Execute put > V 3 sec timeout > Compute new PL again > [continues] > > So, it’s possible for a slow/down node (node R in this case) to eventually > cause two _other nodes_ to each write a sibling, even on a new key. In fact, > depending on the number of nodes in the system and your luck, you could end > up writing more than one sibling on a fresh write in this case. Given your > comment about a network issue potentially being a factor, and the 3-second > timing you noted (the default for the failure timeout), this increases the > likelihood that this was, in fact, the issue. > > A fix for this issue has been worked on and tested, but is not yet > incorporated into a version of Riak for distribution. You can, however, > disable the coordinator retry logic as noted in the issue I referenced above, > or increase the timeout if your cluster is running slowly in general by > setting `riak_kv`, `put_coordinator_failure_timeout` in your > `advanced.config` file (see > http://docs.basho.com/riak/kv/2.2.3/configuring/reference/#advanced-configuration > for the general format of the advanced.config if you’re not familiar). > > Hope this helps. > > Doug Rohrer > > > On 4/18/17, 8:28 AM, "riak-users on behalf of Daniel Abrahamsson" > <riak-users-boun...@lists.basho.com on behalf of hams...@gmail.com> wrote: > > Hi Magnus, > > This cluster has been running in production for a few months. Key > generation is based on flake (https://github.com/boundary/flake); we > have never experienced a collision in the 3+ years we have been using > it heavily in production. However, I will look into that possibility > as well. > > I just noticed that one of the Riak nodes logged this at the time: > > 2017-04-13 17:42:40.567 [error] > <0.3624.28>@riak_api_pb_server:handle_info:331 Unrecognized message > > {30320806,{ok,{r_object,<<"session">>,<<".12011742tWzDvu8mk5WAdfYihfV_T3DcnJ5VDyXC0c">>,[{r_content,{dict,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<"X-Riak-VTag">>,53,114,86,115,108,71,120,112,73,55,108,118,114,100,105,114,107,104,50,66,105,119]],[[<<"index">>]],[],[[<<"X-Riak-Last-Modified">>|{1492,105357,453143}]],[],[]}}},<<... > (actual value removed). > > I also have another example (from the same cluster) where there is a > *single* writer to a key, but after a few writes/updates, it also got > a sibling error. Also at that time, the write+read took significantly > longer than normal. I'll check if we had any "unrecognized messages" > in the Riak logs at that time as well. > > To answer your second question, we are talking to the riak cluster > over protocol buffers, using the official Erlang client. > > //Daniel > > On Tue, Apr 18, 2017 at 1:51 PM, Magnus Kessler <mkess...@basho.com> > wrote: > > On 18 April 2017 at 08:20, Daniel Abrahamsson <hams...@gmail.com> wrote: > >> > >> I've run into a case where I got a sbiling error/response on the first > >> ever write to a key. I would like to understand how this could happen. > >> Normally when you get siblings, it is because you have written a value > >> with an out-of-date vclock. But since this is the first write, there > >> is no vclock. Could someone shed some light on this for me? > >> > >> It is worth to mention that the it took 3 seconds for Riak to deliver > >> the response, so it is possible there was some kind of network issue > >> at the time. > >> > >> Here are some details about my setup: > >> Number of nodes: 8. > >> n_val: 5 > >> write options: pw: 3 (quorum), return_body > >> > >> Regards, > >> Daniel Abrahamsson > >> > > > > > > Hi Daniel, > > > > Please let me know if all nodes in this cluster were set up completely > > fresh, with empty backend directories, or if any of them had been used > > before for a Riak installation. If the latter is the case, it may be > that > > the key in question had already been used once before. Cluster nodes > pick up > > data from pre-existing backends. > > > > How do you access the key for read and write operations? > > > > Kind Regards, > > > > Magnus > > > > > > Magnus Kessler > > Client Services Engineer > > Basho Technologies Limited > > > > Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431 > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com