Also, this issue https://github.com/basho/riak_kv/issues/1188 suggests that adding the property `riak_kv.retry_put_coordinator_failure=false` may help in future. But won’t help with your keys with too many siblings.
On 24 May 2017, at 09:22, Russell Brown <russell.br...@icloud.com> wrote: > > On 24 May 2017, at 09:11, Vladyslav Zakhozhai <v.zakhoz...@smartweb.com.ua> > wrote: > >> Hello, >> >> My riak cluster still experiences "too many siblings". And hinted handoffs >> are not able to be finished completely. So "siblings will be resolved after >> hinted handoffs are finished" is not my case unfortunately. >> >> According to basho's docs >> (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion) >> I need to enable dvv conflict resolution mechanism. So here is a quesion: >> >> Is it safe to enable dvv on default bucket type and how it affects existing >> data? > > It might not affect existing data enough. All the existing siblings are > “undotted” and would need a read-put cycle to resolve. > >> It may be a solution, is not it? > > You may require further action. I remember basho support helping someone with > a similar issue, and there was some manual intervention/scripted solution, > but I can’t remember what it was right now. I think those objects (as logged) > with the sibling issues need to be read and resolved. Maybe one of the > ex-basho support people remembers? I’ll prod one in a back channel and see if > they can help. > >> >> Why I talk about default bucket type? Because there is only one riak client >> - Riak CS and it does not manage bucket types of PUT'ed object (so, default >> bucket type always is used during PUT's). Is it correct? > > Yes. > >> >> Thank you in advance. >> >> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai >> <v.zakhoz...@smartweb.com.ua> wrote: >> Hi Russel, >> >> thank you for your answer. I really appreciate your help. >> >> 2.1.3 is not actually riak_kv version. It is version of basho's riak >> package. Versions of riak subsystems you can see below. >> >> Bucket properties: >> # riak-admin bucket-type list >> default (active) >> >> # riak-admin bucket-type status default >> default is active >> >> allow_mult: true >> basic_quorum: false >> big_vclock: 50 >> chash_keyfun: {riak_core_util,chash_std_keyfun} >> dvv_enabled: false >> dw: quorum >> last_write_wins: false >> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun} >> n_val: 3 >> notfound_ok: true >> old_vclock: 86400 >> postcommit: [] >> pr: 0 >> precommit: [] >> pw: 0 >> r: quorum >> rw: quorum >> small_vclock: 50 >> w: quorum >> write_once: false >> young_vclock: 20 >> >> I did not mentioned that upgrade from riak 1.5.4 have been took place couple >> months ago (about 6 months). As I understand DVV is disabled. Is it safe to >> migrate to setting DVV from Vector Clocks? >> >> Package versions: >> # dpkg -l | grep riak >> ii riak 2.1.3-1 >> amd64 Riak is a distributed data store >> ii riak-cs 2.1.0-1 >> amd64 Riak CS >> >> Subsystems versions: >> "clique_version" : "0.3.2-0-ge332c8f", >> "bitcask_version" : "1.7.2", >> "sys_driver_version" : "2.2", >> "riak_core_version" : "2.1.5-0-gb02ab53", >> "riak_kv_version" : "2.1.2-0-gf969bba", >> "riak_pipe_version" : "2.1.1-0-gb1ac2cf", >> "cluster_info_version" : "2.0.3-0-g76c73fc", >> "riak_auth_mods_version" : "2.1.0-0-g31b8b30", >> "erlydtl_version" : "0.7.0", >> "os_mon_version" : "2.2.13", >> "inets_version" : "5.9.6", >> "erlang_js_version" : "1.3.0-0-g07467d8", >> "riak_control_version" : "2.1.2-0-gab3f924", >> "xmerl_version" : "1.3.4", >> "protobuffs_version" : "0.8.1p5-0-gf88fc3c", >> "riak_sysmon_version" : "2.0.0", >> "compiler_version" : "4.9.3", >> "eleveldb_version" : "2.1.10-0-g0537ca9", >> "lager_version" : "2.1.1", >> "sasl_version" : "2.3.3", >> "riak_dt_version" : "2.1.1-0-ga2986bc", >> "runtime_tools_version" : "1.8.12", >> "yokozuna_version" : "2.1.2-0-g3520d11", >> "riak_search_version" : "2.1.1-0-gffe2113", >> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] >> [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]", >> "basho_stats_version" : "1.0.3", >> "crypto_version" : "3.1", >> "merge_index_version" : "2.0.1-0-g0c8f77c", >> "kernel_version" : "2.16.3", >> "stdlib_version" : "1.19.3", >> "riak_pb_version" : "2.1.0.2-0-g620bc70", >> "syntax_tools_version" : "1.6.11", >> "goldrush_version" : "0.1.7", >> "ibrowse_version" : "4.0.2", >> "mochiweb_version" : "2.9.0", >> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6", >> "ssl_version" : "5.3.1", >> "public_key_version" : "0.20", >> "pbkdf2_version" : "2.0.0-0-g7076584", >> "sidejob_version" : "2.0.0-0-gc5aabba", >> "webmachine_version" : "1.10.8-0-g7677c24", >> "poolboy_version" : "0.8.1p3-0-g8bb45fb", >> "riak_api_version" : "2.1.2-0-gd8d510f", >> "asn1_version" : "2.0.3", >> >> >> On Fri, Jun 17, 2016 at 10:45 AM Russell Brown <russell.br...@me.com> wrote: >> What version of riak_kv is behind this riak_cs install, please? Is it really >> 2.1.3 as stated below? This looks and sounds like sibling explosion, which >> is fixed in riak 2.0 and above. Are you sure that you are using the DVV >> enabled setting for riak_cs bucket properties? Can you post your bucket >> properties please? >> >> On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <v.zakhoz...@smartweb.com.ua> >> wrote: >> >>> Hello. >>> >>> I see very interesting and confusing thing. >>> >>> From my previous letter you can see that siblings count on manifest objects >>> is about 100 (actualy it is in range 100-300). Unfortunately my problem is >>> that almost all PUT requests are failing with 500 Internal Server error. >>> >>> I've tried today set max_siblings riak option to 500. And there were >>> successfull PUT requests but not for long. Now I see in riak logs error >>> with "max siblings", but actual count of them is 500+ (earlier it was >>> 100-300 as I've mentioned). >>> >>> Period of time between max_siblings=500 and errors in log is about 30 >>> minutes. And I want to point your attention that I've forbid PUT method on >>> haproxy - frontend for riak cs. >>> >>> >>> >>> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai >>> <v.zakhoz...@smartweb.com.ua> wrote: >>> Hi, Luke. >>> >>> Thank you for your answer. I did not understand you completely about >>> transfer-limit. How does it relate to my problem. Transfer limit - is a >>> limit of concurrent data transfer from different nodes. Am I wright? You >>> mean that riak can handoff one partition from several nodes concurrently? >>> >>> Now I have transfer-limit 1 on all riak nodes. >>> >>> But I am not sure that my cluster will be converged ever. All nodes >>> experiences low memory and are killed by OOM Killer periodically. I try to >>> add new nodes to the cluster but due problem with OOM killer this process >>> is very-very slow. >>> >>> In the official docs I've read: >>> >>> "Sibling explosion occurs when an object rapidly collects siblings that are >>> not reconciled. This can lead to a variety of problems, including degraded >>> performance, especially if many objects in a cluster suffer from siblings >>> explosion. At the extreme, having an enormous object in a node can cause >>> reads of that object to crash the entire node. Other issues include undue >>> latency and out-of-memory errors." >>> >>> I mentioned that new nodes in the cluster do not experience such problems >>> (I mean out of RAM). >>> >>> Regarding to siblings maybe your are right, this is manifest object. I can >>> recognize key name but not bucket name. But more than 100 siblings on many >>> keys is really confused me. Each time I try to PUT some object to Riak via >>> Riak CS S3 interface I got errors with siblings. >>> >>> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <lbak...@basho.com> wrote: >>> Hi Vladyslav, >>> >>> If you recognize the full name of the object raising the sibling >>> warning, it is most likely a manifest object. Sometimes, during hinted >>> handoff, you can see these messages. They should resolve after handoff >>> completes. >>> >>> Please see the documentation for the transfer-limit command as well: >>> >>> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit >>> >>> -- >>> Luke Bakken >>> Engineer >>> lbak...@basho.com >>> >>> >>> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai >>> <v.zakhoz...@smartweb.com.ua> wrote: >>>> Hi. >>>> >>>> I have a trouble with PUT to Riak CS cluster. During this process I >>>> periodically see the following message in Riak error.log: >>>> >>>> 2016-06-03 11:15:55.201 [error] >>>> <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many >>>> siblings for object OBJECT_NAME (101) >>>> >>>> and also >>>> >>>> 2016-06-03 12:41:50.678 [error] >>>> <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message >>>> {7345880,{error,{too_many_siblings,101}}} >>>> >>>> Here OBJECT_NAME - is the name of object in Riak which has too many >>>> siblings. >>>> >>>> I definitely sure that this objects are static. Nobody deletes is, nobody >>>> rewrites it. I have no idea why more than 100 siblings of this object >>>> occurs. >>>> >>>> The following effect of this issue occurs: >>>> >>>> Great amount of keys are loaded to RAM. I almost out of RAM (Do each >>>> sibling >>>> has it own key or key duplicate?). >>>> Nodes are slow - adding new nodes are too slow >>>> Presence of "too many siblings" affects ownership handoffs >>>> >>>> So I have several questions: >>>> >>>> Do hinted or ownership handoffs can affect siblings count (I mean can >>>> siblings be created during ownership of hinted handoffs) >>>> Is there any workaround of this issue. Do I need remove siblings manually >>>> or >>>> it removes during merges, read repairs and so on >>>> >>>> >>>> My configuration: >>>> >>>> riak from basho's packages - 2.1.3-1 >>>> riak cs from basho's packages - 2.1.0-1 >>>> 24 riak/riak-cs nodes >>>> 32 GB RAM per node >>>> AAE is disabled >>>> >>>> >>>> I appreciate you help. >>>> >>>> _______________________________________________ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com