Hi,
Would be good to know the riak version, and why the dvv_enabled bucket property 
is set to false, please? Also, is there multi-datacentre replication involved? 
Do you re-use your keys, for example, have the keys in question been created, 
deleted, and then re-created?

Cheers

Russell

On 6 Mar 2017, at 15:07, Daniel Miller <dmil...@dimagi.com> wrote:

> I recently had another case of a disappearing object. This time the object 
> was successfully PUT, and (unlike the previous cases reported in this thread) 
> for a period of time GETs were also successful. Then GETs started 404ing for 
> no apparent reason. There are no errors in the logs to indicate that anything 
> unusual happened. This is quite disconcerting. Is it normal that Riak CS just 
> loses track of objects? At this point we are using CS as primary object 
> storage, meaning we do not have the data stored in another database so it's 
> critical that the data is not randomly lost.
> 
> In the CS access logs I see
> 
> # all prior GET requests for this object succeeding like this one. This is 
> the last successful GET request:
> [28/Feb/2017:14:42:35 +0000] "GET 
> /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 200 14923 "" 
> "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
> ...
> # all GET requests for this object are now failing like this one (the first 
> 404):
> [02/Mar/2017:08:36:11 +0000] "GET 
> /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 404 240 "" 
> "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
> 
> The object name has been elided for readability. I do not know when this 
> object was PUT into the cluster because I only have logs for the past month. 
> Is there any way to dig further into Riak or Riak CS data to determine if the 
> object content is actually completely lost or if there are any other details 
> that might explain why it is now missing? Could I increase some logging 
> parameters to get more information about what is going wrong when something 
> like this happens?
> 
> I have searched the logs for other 404 responses but found none (other than 
> the two reported earlier), so this is the 3rd known missing object in the 
> cluster. We retain logs for one month only (I'm increasing this now because 
> of this issue), so it is possible that other objects have also gone missing, 
> but I cannot see them since the logs have been truncated.
> 
> The cluster now has 7 nodes instead of 9 (see earlier emails in this thread), 
> and the riak storage backend is now leveldb instead of multi. I have attached 
> config file templates for riak, raik-cs and stanchion (these are deployed 
> with ansible).
> 
> Bucket properties:
> {
>   "props": {
>     "notfound_ok": true,
>     "n_val": 3,
>     "last_write_wins": false,
>     "allow_mult": true,
>     "dvv_enabled": false,
>     "name": "blobdb",
>     "r": "quorum",
>     "precommit": [],
>     "old_vclock": 86400,
>     "dw": "quorum",
>     "rw": "quorum",
>     "small_vclock": 50,
>     "write_once": false,
>     "basic_quorum": false,
>     "big_vclock": 50,
>     "chash_keyfun": {
>       "fun": "chash_std_keyfun",
>       "mod": "riak_core_util"
>     },
>     "postcommit": [],
>     "pw": 0,
>     "w": "quorum",
>     "young_vclock": 20,
>     "pr": 0,
>     "linkfun": {
>       "fun": "mapreduce_linkfun",
>       "mod": "riak_kv_wm_link_walker"
>     }
>   }
> }
> 
> I'll be happy to provide more context to help troubleshoot this issue.
> 
> Thanks in advance for any help you can provide.
> 
> Daniel
> 
> 
> On Tue, Feb 14, 2017 at 11:52 AM, Daniel Miller <dmil...@dimagi.com> wrote:
> Hi Luke,
> 
> Sorry for the late response and thanks for following up. I haven't seen it 
> happen since. At this point I'm going to wait and see if it happens again and 
> hopefully get more details about what might be causing it.
> 
> Daniel
> 
> On Thu, Feb 9, 2017 at 1:02 PM, Luke Bakken <lbak...@basho.com> wrote:
> Hi Daniel -
> 
> I don't have any ideas at this point. Has this scenario happened again?
> 
> --
> Luke Bakken
> Engineer
> lbak...@basho.com
> 
> 
> On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <dmil...@dimagi.com> wrote:
> > Thanks for the quick response, Luke.
> >
> > There is nothing unusual about the keys. The format is a name + UUID + some
> > other random URL-encoded charaters, like most other keys in our cluster.
> >
> > There are no errors near the time of the incident in any of the logs (the
> > last [error] is from over a month before). I see lots of messages like this
> > in console.log:
> >
> > /var/log/riak/console.log
> > 2017-01-20 15:38:10.184 [info]
> > <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {776422744832042175295707567380525354192214163456,3} between
> > {776422744832042175295707567380525354192214163456,'riak-fa...@fake3.fake.com'}
> > and
> > {822094670998632891489572718402909198556462055424,'riak-fa...@fake9.fake.com'}
> > 2017-01-20 15:40:39.640 [info]
> > <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during
> > active anti-entropy exchange of
> > {936274486415109681974235595958868809467081785344,3} between
> > {959110449498405040071168171470060731649205731328,'riak-fa...@fake3.fake.com'}
> > and
> > {981946412581700398168100746981252653831329677312,'riak-fa...@fake5.fake.com'}
> > 2017-01-20 15:46:40.918 [info]
> > <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {662242929415565384811044689824565743281594433536,3} between
> > {685078892498860742907977265335757665463718379520,'riak-fa...@fake3.fake.com'}
> > and
> > {707914855582156101004909840846949587645842325504,'riak-fa...@fake6.fake.com'}
> > 2017-01-20 15:48:25.597 [info]
> > <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {776422744832042175295707567380525354192214163456,3} between
> > {776422744832042175295707567380525354192214163456,'riak-fa...@fake3.fake.com'}
> > and
> > {799258707915337533392640142891717276374338109440,'riak-fa...@fake0.fake.com'}
> >
> > Thanks!
> > Daniel
> >
> >
> >
> > On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <lbak...@basho.com> wrote:
> >>
> >> Hi Daniel -
> >>
> >> This is a strange scenario. I recommend looking at all of the log
> >> files for "[error]" or other entries at about the same time as these
> >> PUTs or 404 responses.
> >>
> >> Is there anything unusual about the key being used?
> >> --
> >> Luke Bakken
> >> Engineer
> >> lbak...@basho.com
> >>
> >>
> >> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <dmil...@dimagi.com> wrote:
> >> > I have a 9-node Riak CS cluster that has been working flawlessly for
> >> > about 3
> >> > months. The cluster configuration, including backend and bucket
> >> > parameters
> >> > such as N-value are using default settings. I'm using the S3 API to
> >> > communicate with the cluster.
> >> >
> >> > Within the past week I had an issue where two objects were PUT resulting
> >> > in
> >> > a 200 (success) response, but all subsequent GET requests for those two
> >> > keys
> >> > return status of 404 (not found). Other than the fact that they are now
> >> > missing, there was nothing out of the ordinary with these particular to
> >> > PUTs. Maybe I'm missing something, but this seems like a scenario that
> >> > should never happen. All information included here about PUTs and GETs
> >> > comes
> >> > from reviewing the CS access logs. Both objects were PUT on the same
> >> > node,
> >> > however GET requests returning 404 have been observed on all nodes.
> >> > There is
> >> > plenty of other traffic on the cluster involving GETs and PUTs that are
> >> > not
> >> > failing. I'm unsure of how to troubleshoot further to find out what may
> >> > have
> >> > happened to those objects and why they are now missing. What is the best
> >> > approach to figure out why an object that was successfully PUT seems to
> >> > be
> >> > missing?
> >> >
> >> > Thanks!
> >> > Daniel Miller
> >> >
> >> > _______________________________________________
> >> > riak-users mailing list
> >> > riak-users@lists.basho.com
> >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >> >
> >
> >
> 
> 
> <config-files.zip>_______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to