Hi, Would be good to know the riak version, and why the dvv_enabled bucket property is set to false, please? Also, is there multi-datacentre replication involved? Do you re-use your keys, for example, have the keys in question been created, deleted, and then re-created?
Cheers Russell On 6 Mar 2017, at 15:07, Daniel Miller <dmil...@dimagi.com> wrote: > I recently had another case of a disappearing object. This time the object > was successfully PUT, and (unlike the previous cases reported in this thread) > for a period of time GETs were also successful. Then GETs started 404ing for > no apparent reason. There are no errors in the logs to indicate that anything > unusual happened. This is quite disconcerting. Is it normal that Riak CS just > loses track of objects? At this point we are using CS as primary object > storage, meaning we do not have the data stored in another database so it's > critical that the data is not randomly lost. > > In the CS access logs I see > > # all prior GET requests for this object succeeding like this one. This is > the last successful GET request: > [28/Feb/2017:14:42:35 +0000] "GET > /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 200 14923 "" > "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource" > ... > # all GET requests for this object are now failing like this one (the first > 404): > [02/Mar/2017:08:36:11 +0000] "GET > /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 404 240 "" > "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource" > > The object name has been elided for readability. I do not know when this > object was PUT into the cluster because I only have logs for the past month. > Is there any way to dig further into Riak or Riak CS data to determine if the > object content is actually completely lost or if there are any other details > that might explain why it is now missing? Could I increase some logging > parameters to get more information about what is going wrong when something > like this happens? > > I have searched the logs for other 404 responses but found none (other than > the two reported earlier), so this is the 3rd known missing object in the > cluster. We retain logs for one month only (I'm increasing this now because > of this issue), so it is possible that other objects have also gone missing, > but I cannot see them since the logs have been truncated. > > The cluster now has 7 nodes instead of 9 (see earlier emails in this thread), > and the riak storage backend is now leveldb instead of multi. I have attached > config file templates for riak, raik-cs and stanchion (these are deployed > with ansible). > > Bucket properties: > { > "props": { > "notfound_ok": true, > "n_val": 3, > "last_write_wins": false, > "allow_mult": true, > "dvv_enabled": false, > "name": "blobdb", > "r": "quorum", > "precommit": [], > "old_vclock": 86400, > "dw": "quorum", > "rw": "quorum", > "small_vclock": 50, > "write_once": false, > "basic_quorum": false, > "big_vclock": 50, > "chash_keyfun": { > "fun": "chash_std_keyfun", > "mod": "riak_core_util" > }, > "postcommit": [], > "pw": 0, > "w": "quorum", > "young_vclock": 20, > "pr": 0, > "linkfun": { > "fun": "mapreduce_linkfun", > "mod": "riak_kv_wm_link_walker" > } > } > } > > I'll be happy to provide more context to help troubleshoot this issue. > > Thanks in advance for any help you can provide. > > Daniel > > > On Tue, Feb 14, 2017 at 11:52 AM, Daniel Miller <dmil...@dimagi.com> wrote: > Hi Luke, > > Sorry for the late response and thanks for following up. I haven't seen it > happen since. At this point I'm going to wait and see if it happens again and > hopefully get more details about what might be causing it. > > Daniel > > On Thu, Feb 9, 2017 at 1:02 PM, Luke Bakken <lbak...@basho.com> wrote: > Hi Daniel - > > I don't have any ideas at this point. Has this scenario happened again? > > -- > Luke Bakken > Engineer > lbak...@basho.com > > > On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <dmil...@dimagi.com> wrote: > > Thanks for the quick response, Luke. > > > > There is nothing unusual about the keys. The format is a name + UUID + some > > other random URL-encoded charaters, like most other keys in our cluster. > > > > There are no errors near the time of the incident in any of the logs (the > > last [error] is from over a month before). I see lots of messages like this > > in console.log: > > > > /var/log/riak/console.log > > 2017-01-20 15:38:10.184 [info] > > <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during > > active anti-entropy exchange of > > {776422744832042175295707567380525354192214163456,3} between > > {776422744832042175295707567380525354192214163456,'riak-fa...@fake3.fake.com'} > > and > > {822094670998632891489572718402909198556462055424,'riak-fa...@fake9.fake.com'} > > 2017-01-20 15:40:39.640 [info] > > <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during > > active anti-entropy exchange of > > {936274486415109681974235595958868809467081785344,3} between > > {959110449498405040071168171470060731649205731328,'riak-fa...@fake3.fake.com'} > > and > > {981946412581700398168100746981252653831329677312,'riak-fa...@fake5.fake.com'} > > 2017-01-20 15:46:40.918 [info] > > <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during > > active anti-entropy exchange of > > {662242929415565384811044689824565743281594433536,3} between > > {685078892498860742907977265335757665463718379520,'riak-fa...@fake3.fake.com'} > > and > > {707914855582156101004909840846949587645842325504,'riak-fa...@fake6.fake.com'} > > 2017-01-20 15:48:25.597 [info] > > <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during > > active anti-entropy exchange of > > {776422744832042175295707567380525354192214163456,3} between > > {776422744832042175295707567380525354192214163456,'riak-fa...@fake3.fake.com'} > > and > > {799258707915337533392640142891717276374338109440,'riak-fa...@fake0.fake.com'} > > > > Thanks! > > Daniel > > > > > > > > On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <lbak...@basho.com> wrote: > >> > >> Hi Daniel - > >> > >> This is a strange scenario. I recommend looking at all of the log > >> files for "[error]" or other entries at about the same time as these > >> PUTs or 404 responses. > >> > >> Is there anything unusual about the key being used? > >> -- > >> Luke Bakken > >> Engineer > >> lbak...@basho.com > >> > >> > >> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <dmil...@dimagi.com> wrote: > >> > I have a 9-node Riak CS cluster that has been working flawlessly for > >> > about 3 > >> > months. The cluster configuration, including backend and bucket > >> > parameters > >> > such as N-value are using default settings. I'm using the S3 API to > >> > communicate with the cluster. > >> > > >> > Within the past week I had an issue where two objects were PUT resulting > >> > in > >> > a 200 (success) response, but all subsequent GET requests for those two > >> > keys > >> > return status of 404 (not found). Other than the fact that they are now > >> > missing, there was nothing out of the ordinary with these particular to > >> > PUTs. Maybe I'm missing something, but this seems like a scenario that > >> > should never happen. All information included here about PUTs and GETs > >> > comes > >> > from reviewing the CS access logs. Both objects were PUT on the same > >> > node, > >> > however GET requests returning 404 have been observed on all nodes. > >> > There is > >> > plenty of other traffic on the cluster involving GETs and PUTs that are > >> > not > >> > failing. I'm unsure of how to troubleshoot further to find out what may > >> > have > >> > happened to those objects and why they are now missing. What is the best > >> > approach to figure out why an object that was successfully PUT seems to > >> > be > >> > missing? > >> > > >> > Thanks! > >> > Daniel Miller > >> > > >> > _______________________________________________ > >> > riak-users mailing list > >> > riak-users@lists.basho.com > >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >> > > > > > > > > <config-files.zip>_______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com