Re: Put failure: too many siblings

2017-05-24 Thread Vladyslav Zakhozhai
Hello,

My riak cluster still experiences "too many siblings". And hinted handoffs
are not able to be finished completely. So "siblings will be resolved after
hinted handoffs are finished" is not my case unfortunately.

According to basho's docs (
http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion)
I need to enable dvv conflict resolution mechanism. So here is a quesion:

Is it safe to enable dvv on default bucket type and how it affects existing
data? It may be a solution, is not it?

Why I talk about default bucket type? Because there is only one riak client
- Riak CS and it does not manage bucket types of PUT'ed object (so, default
bucket type always is used during PUT's). Is it correct?

Thank you in advance.

On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai <
v.zakhoz...@smartweb.com.ua> wrote:

> Hi Russel,
>
> thank you for your answer. I really appreciate your help.
>
> 2.1.3 is not actually riak_kv version. It is version of basho's riak
> package. Versions of riak subsystems you can see below.
>
> Bucket properties:
> # riak-admin bucket-type list
> default (active)
>
> # riak-admin bucket-type status default
> default is active
>
> allow_mult: true
> basic_quorum: false
> big_vclock: 50
> chash_keyfun: {riak_core_util,chash_std_keyfun}
> dvv_enabled: false
> dw: quorum
> last_write_wins: false
> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
> n_val: 3
> notfound_ok: true
> old_vclock: 86400
> postcommit: []
> pr: 0
> precommit: []
> pw: 0
> r: quorum
> rw: quorum
> small_vclock: 50
> w: quorum
> write_once: false
> young_vclock: 20
>
> I did not mentioned that upgrade from riak 1.5.4 have been took place
> couple months ago (about 6 months). As I understand DVV is disabled. Is it
> safe to migrate to setting DVV from Vector Clocks?
>
> Package versions:
> # dpkg -l | grep riak
> ii  riak2.1.3-1
>  amd64Riak is a distributed data store
> ii  riak-cs 2.1.0-1
>  amd64Riak CS
>
> Subsystems versions:
> "clique_version" : "0.3.2-0-ge332c8f",
> "bitcask_version" : "1.7.2",
> "sys_driver_version" : "2.2",
> "riak_core_version" : "2.1.5-0-gb02ab53",
> "riak_kv_version" : "2.1.2-0-gf969bba",
> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
> "cluster_info_version" : "2.0.3-0-g76c73fc",
> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
> "erlydtl_version" : "0.7.0",
> "os_mon_version" : "2.2.13",
> "inets_version" : "5.9.6",
> "erlang_js_version" : "1.3.0-0-g07467d8",
> "riak_control_version" : "2.1.2-0-gab3f924",
> "xmerl_version" : "1.3.4",
> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
> "riak_sysmon_version" : "2.0.0",
> "compiler_version" : "4.9.3",
> "eleveldb_version" : "2.1.10-0-g0537ca9",
> "lager_version" : "2.1.1",
> "sasl_version" : "2.3.3",
> "riak_dt_version" : "2.1.1-0-ga2986bc",
> "runtime_tools_version" : "1.8.12",
> "yokozuna_version" : "2.1.2-0-g3520d11",
> "riak_search_version" : "2.1.1-0-gffe2113",
> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source]
> [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
> "basho_stats_version" : "1.0.3",
> "crypto_version" : "3.1",
> "merge_index_version" : "2.0.1-0-g0c8f77c",
> "kernel_version" : "2.16.3",
> "stdlib_version" : "1.19.3",
> "riak_pb_version" : "2.1.0.2-0-g620bc70",
> "syntax_tools_version" : "1.6.11",
> "goldrush_version" : "0.1.7",
> "ibrowse_version" : "4.0.2",
> "mochiweb_version" : "2.9.0",
> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
> "ssl_version" : "5.3.1",
> "public_key_version" : "0.20",
> "pbkdf2_version" : "2.0.0-0-g7076584",
> "sidejob_version" : "2.0.0-0-gc5aabba",
> "webmachine_version" : "1.10.8-0-g7677c24",
> "poolboy_version" : "0.8.1p3-0-g8bb45fb",
> "riak_api_version" : "2.1.2-0-gd8d510f",
> "asn1_version" : "2.0.3",
>
>
> On Fri, Jun 17, 2016 at 10:45 AM Russell Brown 
> wrote:
>
>> What version of riak_kv is behind this riak_cs install, please? Is it
>> really 2.1.3 as stated below? This looks and sounds like sibling explosion,
>> which is fixed in riak 2.0 and above. Are you sure that you are using the
>> DVV enabled setting for riak_cs bucket properties? Can you post your bucket
>> properties please?
>>
>> On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <
>> v.zakhoz...@smartweb.com.ua> wrote:
>>
>> > Hello.
>> >
>> > I see very interesting and confusing thing.
>> >
>> > From my previous letter you can see that siblings count on manifest
>> objects is about 100 (actualy it is in range 100-300). Unfortunately my
>> problem is that almost all PUT requests are failing with 500 Internal
>> Server error.
>> >
>> > I've tried today set max_siblings riak option to 500. And there were
>> successfull PUT requests but not for long. Now I see in riak logs error
>> with "max siblings", but actual count of them is 500+ (earlier it was
>> 100-300 as I've mentioned).
>> >
>> > Period of time between max_siblings=500 and errors in log is about 30
>> minutes. And 

Re: Put failure: too many siblings

2017-05-24 Thread Russell Brown

On 24 May 2017, at 09:11, Vladyslav Zakhozhai  
wrote:

> Hello,
> 
> My riak cluster still experiences "too many siblings". And hinted handoffs 
> are not able to be finished completely. So "siblings will be resolved after 
> hinted handoffs are finished" is not my case unfortunately.
> 
> According to basho's docs 
> (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion)
>  I need to enable dvv conflict resolution mechanism. So here is a quesion:
> 
> Is it safe to enable dvv on default bucket type and how it affects existing 
> data?

It might not affect existing data enough. All the existing siblings are 
“undotted” and would need a read-put cycle to resolve.

> It may be a solution, is not it?

You may require further action. I remember basho support helping someone with a 
similar issue, and there was some manual intervention/scripted solution, but I 
can’t remember what it was right now. I think those objects (as logged) with 
the sibling issues need to be read and resolved. Maybe one of the ex-basho 
support people remembers? I’ll prod one in a back channel and see if they can 
help.

> 
> Why I talk about default bucket type? Because there is only one riak client - 
> Riak CS and it does not manage bucket types of PUT'ed object (so, default 
> bucket type always is used during PUT's). Is it correct?

Yes.

> 
> Thank you in advance.
> 
> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai 
>  wrote:
> Hi Russel,
> 
> thank you for your answer. I really appreciate your help.
> 
> 2.1.3 is not actually riak_kv version. It is version of basho's riak package. 
> Versions of riak subsystems you can see below.
> 
> Bucket properties:
> # riak-admin bucket-type list
> default (active)
> 
> # riak-admin bucket-type status default
> default is active
> 
> allow_mult: true
> basic_quorum: false
> big_vclock: 50
> chash_keyfun: {riak_core_util,chash_std_keyfun}
> dvv_enabled: false
> dw: quorum
> last_write_wins: false
> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
> n_val: 3
> notfound_ok: true
> old_vclock: 86400
> postcommit: []
> pr: 0
> precommit: []
> pw: 0
> r: quorum
> rw: quorum
> small_vclock: 50
> w: quorum
> write_once: false
> young_vclock: 20
> 
> I did not mentioned that upgrade from riak 1.5.4 have been took place couple 
> months ago (about 6 months). As I understand DVV is disabled. Is it safe to 
> migrate to setting DVV from Vector Clocks?
> 
> Package versions:
> # dpkg -l | grep riak
> ii  riak2.1.3-1  
> amd64Riak is a distributed data store
> ii  riak-cs 2.1.0-1  
> amd64Riak CS
> 
> Subsystems versions:
> "clique_version" : "0.3.2-0-ge332c8f",
> "bitcask_version" : "1.7.2",
> "sys_driver_version" : "2.2",
> "riak_core_version" : "2.1.5-0-gb02ab53",
> "riak_kv_version" : "2.1.2-0-gf969bba",
> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
> "cluster_info_version" : "2.0.3-0-g76c73fc",
> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
> "erlydtl_version" : "0.7.0",
> "os_mon_version" : "2.2.13",
> "inets_version" : "5.9.6",
> "erlang_js_version" : "1.3.0-0-g07467d8",
> "riak_control_version" : "2.1.2-0-gab3f924",
> "xmerl_version" : "1.3.4",
> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
> "riak_sysmon_version" : "2.0.0",
> "compiler_version" : "4.9.3",
> "eleveldb_version" : "2.1.10-0-g0537ca9",
> "lager_version" : "2.1.1",
> "sasl_version" : "2.3.3",
> "riak_dt_version" : "2.1.1-0-ga2986bc",
> "runtime_tools_version" : "1.8.12",
> "yokozuna_version" : "2.1.2-0-g3520d11",
> "riak_search_version" : "2.1.1-0-gffe2113",
> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] 
> [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
> "basho_stats_version" : "1.0.3",
> "crypto_version" : "3.1",
> "merge_index_version" : "2.0.1-0-g0c8f77c",
> "kernel_version" : "2.16.3",
> "stdlib_version" : "1.19.3",
> "riak_pb_version" : "2.1.0.2-0-g620bc70",
> "syntax_tools_version" : "1.6.11",
> "goldrush_version" : "0.1.7",
> "ibrowse_version" : "4.0.2",
> "mochiweb_version" : "2.9.0",
> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
> "ssl_version" : "5.3.1",
> "public_key_version" : "0.20",
> "pbkdf2_version" : "2.0.0-0-g7076584",
> "sidejob_version" : "2.0.0-0-gc5aabba",
> "webmachine_version" : "1.10.8-0-g7677c24",
> "poolboy_version" : "0.8.1p3-0-g8bb45fb",
> "riak_api_version" : "2.1.2-0-gd8d510f",
> "asn1_version" : "2.0.3",
> 
> 
> On Fri, Jun 17, 2016 at 10:45 AM Russell Brown  wrote:
> What version of riak_kv is behind this riak_cs install, please? Is it really 
> 2.1.3 as stated below? This looks and sounds like sibling explosion, which is 
> fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled 
> setting for riak_cs bucket properties? Can you post your bucket properties 
> please?
> 
> On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai  
> wrote:
> 
> > Hello.
> >
> > 

Re: Put failure: too many siblings

2017-05-24 Thread Russell Brown
Also, this issue https://github.com/basho/riak_kv/issues/1188 suggests that 
adding the property `riak_kv.retry_put_coordinator_failure=false` may help in 
future. But won’t help with your keys with too many siblings.

On 24 May 2017, at 09:22, Russell Brown  wrote:

> 
> On 24 May 2017, at 09:11, Vladyslav Zakhozhai  
> wrote:
> 
>> Hello,
>> 
>> My riak cluster still experiences "too many siblings". And hinted handoffs 
>> are not able to be finished completely. So "siblings will be resolved after 
>> hinted handoffs are finished" is not my case unfortunately.
>> 
>> According to basho's docs 
>> (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion)
>>  I need to enable dvv conflict resolution mechanism. So here is a quesion:
>> 
>> Is it safe to enable dvv on default bucket type and how it affects existing 
>> data?
> 
> It might not affect existing data enough. All the existing siblings are 
> “undotted” and would need a read-put cycle to resolve.
> 
>> It may be a solution, is not it?
> 
> You may require further action. I remember basho support helping someone with 
> a similar issue, and there was some manual intervention/scripted solution, 
> but I can’t remember what it was right now. I think those objects (as logged) 
> with the sibling issues need to be read and resolved. Maybe one of the 
> ex-basho support people remembers? I’ll prod one in a back channel and see if 
> they can help.
> 
>> 
>> Why I talk about default bucket type? Because there is only one riak client 
>> - Riak CS and it does not manage bucket types of PUT'ed object (so, default 
>> bucket type always is used during PUT's). Is it correct?
> 
> Yes.
> 
>> 
>> Thank you in advance.
>> 
>> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai 
>>  wrote:
>> Hi Russel,
>> 
>> thank you for your answer. I really appreciate your help.
>> 
>> 2.1.3 is not actually riak_kv version. It is version of basho's riak 
>> package. Versions of riak subsystems you can see below.
>> 
>> Bucket properties:
>> # riak-admin bucket-type list
>> default (active)
>> 
>> # riak-admin bucket-type status default
>> default is active
>> 
>> allow_mult: true
>> basic_quorum: false
>> big_vclock: 50
>> chash_keyfun: {riak_core_util,chash_std_keyfun}
>> dvv_enabled: false
>> dw: quorum
>> last_write_wins: false
>> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
>> n_val: 3
>> notfound_ok: true
>> old_vclock: 86400
>> postcommit: []
>> pr: 0
>> precommit: []
>> pw: 0
>> r: quorum
>> rw: quorum
>> small_vclock: 50
>> w: quorum
>> write_once: false
>> young_vclock: 20
>> 
>> I did not mentioned that upgrade from riak 1.5.4 have been took place couple 
>> months ago (about 6 months). As I understand DVV is disabled. Is it safe to 
>> migrate to setting DVV from Vector Clocks?
>> 
>> Package versions:
>> # dpkg -l | grep riak
>> ii  riak2.1.3-1  
>> amd64Riak is a distributed data store
>> ii  riak-cs 2.1.0-1  
>> amd64Riak CS
>> 
>> Subsystems versions:
>> "clique_version" : "0.3.2-0-ge332c8f",
>> "bitcask_version" : "1.7.2",
>> "sys_driver_version" : "2.2",
>> "riak_core_version" : "2.1.5-0-gb02ab53",
>> "riak_kv_version" : "2.1.2-0-gf969bba",
>> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
>> "cluster_info_version" : "2.0.3-0-g76c73fc",
>> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
>> "erlydtl_version" : "0.7.0",
>> "os_mon_version" : "2.2.13",
>> "inets_version" : "5.9.6",
>> "erlang_js_version" : "1.3.0-0-g07467d8",
>> "riak_control_version" : "2.1.2-0-gab3f924",
>> "xmerl_version" : "1.3.4",
>> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
>> "riak_sysmon_version" : "2.0.0",
>> "compiler_version" : "4.9.3",
>> "eleveldb_version" : "2.1.10-0-g0537ca9",
>> "lager_version" : "2.1.1",
>> "sasl_version" : "2.3.3",
>> "riak_dt_version" : "2.1.1-0-ga2986bc",
>> "runtime_tools_version" : "1.8.12",
>> "yokozuna_version" : "2.1.2-0-g3520d11",
>> "riak_search_version" : "2.1.1-0-gffe2113",
>> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] 
>> [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
>> "basho_stats_version" : "1.0.3",
>> "crypto_version" : "3.1",
>> "merge_index_version" : "2.0.1-0-g0c8f77c",
>> "kernel_version" : "2.16.3",
>> "stdlib_version" : "1.19.3",
>> "riak_pb_version" : "2.1.0.2-0-g620bc70",
>> "syntax_tools_version" : "1.6.11",
>> "goldrush_version" : "0.1.7",
>> "ibrowse_version" : "4.0.2",
>> "mochiweb_version" : "2.9.0",
>> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
>> "ssl_version" : "5.3.1",
>> "public_key_version" : "0.20",
>> "pbkdf2_version" : "2.0.0-0-g7076584",
>> "sidejob_version" : "2.0.0-0-gc5aabba",
>> "webmachine_version" : "1.10.8-0-g7677c24",
>> "poolboy_version" : "0.8.1p3-0-g8bb45fb",
>> "riak_api_version" : "2.1.2-0-gd8d510f",
>> "asn1_version" : "2.0.3",
>> 
>> 
>> On Fri, Jun 17, 2016 at 10:45 AM Russell Bro

Re: Put failure: too many siblings

2017-05-24 Thread Vladyslav Zakhozhai
Russell, thank you for the answer.

> Maybe one of the ex-basho support people remembers? I’ll prod one in a
back channel and see if they can help.

It would be great.

Thank you once more.

On Wed, May 24, 2017 at 11:36 AM Russell Brown 
wrote:

> Also, this issue https://github.com/basho/riak_kv/issues/1188 suggests
> that adding the property `riak_kv.retry_put_coordinator_failure=false` may
> help in future. But won’t help with your keys with too many siblings.
>
> On 24 May 2017, at 09:22, Russell Brown  wrote:
>
> >
> > On 24 May 2017, at 09:11, Vladyslav Zakhozhai <
> v.zakhoz...@smartweb.com.ua> wrote:
> >
> >> Hello,
> >>
> >> My riak cluster still experiences "too many siblings". And hinted
> handoffs are not able to be finished completely. So "siblings will be
> resolved after hinted handoffs are finished" is not my case unfortunately.
> >>
> >> According to basho's docs (
> http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion)
> I need to enable dvv conflict resolution mechanism. So here is a quesion:
> >>
> >> Is it safe to enable dvv on default bucket type and how it affects
> existing data?
> >
> > It might not affect existing data enough. All the existing siblings are
> “undotted” and would need a read-put cycle to resolve.
> >
> >> It may be a solution, is not it?
> >
> > You may require further action. I remember basho support helping someone
> with a similar issue, and there was some manual intervention/scripted
> solution, but I can’t remember what it was right now. I think those objects
> (as logged) with the sibling issues need to be read and resolved. Maybe one
> of the ex-basho support people remembers? I’ll prod one in a back channel
> and see if they can help.
> >
> >>
> >> Why I talk about default bucket type? Because there is only one riak
> client - Riak CS and it does not manage bucket types of PUT'ed object (so,
> default bucket type always is used during PUT's). Is it correct?
> >
> > Yes.
> >
> >>
> >> Thank you in advance.
> >>
> >> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai <
> v.zakhoz...@smartweb.com.ua> wrote:
> >> Hi Russel,
> >>
> >> thank you for your answer. I really appreciate your help.
> >>
> >> 2.1.3 is not actually riak_kv version. It is version of basho's riak
> package. Versions of riak subsystems you can see below.
> >>
> >> Bucket properties:
> >> # riak-admin bucket-type list
> >> default (active)
> >>
> >> # riak-admin bucket-type status default
> >> default is active
> >>
> >> allow_mult: true
> >> basic_quorum: false
> >> big_vclock: 50
> >> chash_keyfun: {riak_core_util,chash_std_keyfun}
> >> dvv_enabled: false
> >> dw: quorum
> >> last_write_wins: false
> >> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
> >> n_val: 3
> >> notfound_ok: true
> >> old_vclock: 86400
> >> postcommit: []
> >> pr: 0
> >> precommit: []
> >> pw: 0
> >> r: quorum
> >> rw: quorum
> >> small_vclock: 50
> >> w: quorum
> >> write_once: false
> >> young_vclock: 20
> >>
> >> I did not mentioned that upgrade from riak 1.5.4 have been took place
> couple months ago (about 6 months). As I understand DVV is disabled. Is it
> safe to migrate to setting DVV from Vector Clocks?
> >>
> >> Package versions:
> >> # dpkg -l | grep riak
> >> ii  riak2.1.3-1
>   amd64Riak is a distributed data store
> >> ii  riak-cs 2.1.0-1
>   amd64Riak CS
> >>
> >> Subsystems versions:
> >> "clique_version" : "0.3.2-0-ge332c8f",
> >> "bitcask_version" : "1.7.2",
> >> "sys_driver_version" : "2.2",
> >> "riak_core_version" : "2.1.5-0-gb02ab53",
> >> "riak_kv_version" : "2.1.2-0-gf969bba",
> >> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
> >> "cluster_info_version" : "2.0.3-0-g76c73fc",
> >> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
> >> "erlydtl_version" : "0.7.0",
> >> "os_mon_version" : "2.2.13",
> >> "inets_version" : "5.9.6",
> >> "erlang_js_version" : "1.3.0-0-g07467d8",
> >> "riak_control_version" : "2.1.2-0-gab3f924",
> >> "xmerl_version" : "1.3.4",
> >> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
> >> "riak_sysmon_version" : "2.0.0",
> >> "compiler_version" : "4.9.3",
> >> "eleveldb_version" : "2.1.10-0-g0537ca9",
> >> "lager_version" : "2.1.1",
> >> "sasl_version" : "2.3.3",
> >> "riak_dt_version" : "2.1.1-0-ga2986bc",
> >> "runtime_tools_version" : "1.8.12",
> >> "yokozuna_version" : "2.1.2-0-g3520d11",
> >> "riak_search_version" : "2.1.1-0-gffe2113",
> >> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source]
> [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
> >> "basho_stats_version" : "1.0.3",
> >> "crypto_version" : "3.1",
> >> "merge_index_version" : "2.0.1-0-g0c8f77c",
> >> "kernel_version" : "2.16.3",
> >> "stdlib_version" : "1.19.3",
> >> "riak_pb_version" : "2.1.0.2-0-g620bc70",
> >> "syntax_tools_version" : "1.6.11",
> >> "goldrush_version" : "0.1.7",
> >> "ibrowse_version" : "4.0.2",
> >> "mochiweb_version" : "2.9.0",
> 

Issues with partition distribution across nodes

2017-05-24 Thread Denis Gudtsov
Hello

We have 6-nodes cluster with ring size 128 configured. The problem is that
two partitions has replicas only on two nodes rather than three as required
(n_val=3). We have tried several times to clean leveldb and ring directories
and then rebuild cluster, but this issue is still present. 
How can we diagnose where the issue is and fix it? Is there any way how we
can assign partition to node manually? 

Please find output of member-status below and screen from riak control ring
status:
[root@riak01 ~]# riak-admin  member-status
= Membership
==
Status RingPendingNode
---
valid  17.2%  --  'riak@riak01.
valid  17.2%  --  'riak@riak02.
valid  16.4%  --  'riak@riak03.
valid  16.4%  --  'riak@riak04.
valid  16.4%  --  'riak@riak05.
valid  16.4%  --  'riak@riak06.
---
Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

 

Thank you.



--
View this message in context: 
http://riak-users.197444.n3.nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Issues with partition distribution across nodes

2017-05-24 Thread Russell Brown
Hi,

This is just a quick reply since this is somewhat a current  topic on the ML.

On 24 May 2017, at 12:57, Denis Gudtsov  wrote:

> Hello
> 
> We have 6-nodes cluster with ring size 128 configured. The problem is that
> two partitions has replicas only on two nodes rather than three as required
> (n_val=3). We have tried several times to clean leveldb and ring directories
> and then rebuild cluster, but this issue is still present. 

There was a fairly long discussion about this very issue recently (see 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-May/019281.html)

I ran a little code and the following {RingSize, NodeCount, IsViolated} tuples 
were the result. If you built any of these clusters from scratch (i.e. you 
started NodeCount nodes, and used riak-admin cluster join, riak-admin cluster 
plan, riak-admin cluster commit to create a cluster of NodeCount from scratch) 
then you have tail violations in your ring.

[{16,3,true},
 {16,5,true},
 {16,7,true},
 {16,13,true},
 {16,14,true},
 {32,3,true},
 {32,5,true},
 {32,6,true},
 {32,10,true},
 {64,3,true},
 {64,7,true},
 {64,9,true},
 {128,3,true},
 {128,5,true},
 {128,6,true},
 {128,7,true},
 {128,9,true},
 {128,14,true},
 {256,3,true},
 {256,5,true},
 {256,11,true},
 {512,3,true},
 {512,5,true},
 {512,6,true},
 {512,7,true},
 {512,10,true}]


> How can we diagnose where the issue is and fix it?

WRT your problem, a quick experiment looks like adding 2 new nodes will solve 
your problem, just adding one doesn’t look like it does. I tried just adding 
one new node and still had a single violated preflist, but I have just thrown a 
little experiment together so I could well be wrong. It doesn’t actually build 
any clusters, and uses the claim code out of context, ymmv

> Is there any way how we
> can assign partition to node manually? 

I don’t know of a way, but that would be very useful.

Do you remember if this cluster was built all at once as a 6-node cluster, or 
has it grown over time? Have you run the command riak-admin diag ring_preflists 
as documented here 
http://docs.basho.com/riak/kv/2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?

Sorry I can’t be more help

Cheers

Russell

> 
> Please find output of member-status below and screen from riak control ring
> status:
> [root@riak01 ~]# riak-admin  member-status
> = Membership
> ==
> Status RingPendingNode
> ---
> valid  17.2%  --  'riak@riak01.
> valid  17.2%  --  'riak@riak02.
> valid  16.4%  --  'riak@riak03.
> valid  16.4%  --  'riak@riak04.
> valid  16.4%  --  'riak@riak05.
> valid  16.4%  --  'riak@riak06.
> ---
> Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> 
>  
> 
> Thank you.
> 
> 
> 
> --
> View this message in context: 
> http://riak-users.197444.n3.nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> Sent from the Riak Users mailing list archive at Nabble.com.
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Issues with partition distribution across nodes

2017-05-24 Thread Denis
Hi Russell

Thank you for your suggestions. I found diag is saying: "The following
preflists do not satisfy the n_val. Please add more nodes". It seems
because ring size (128) divided in 6 nodes is hard to arrange.
The history of our cluster is long story, cause we testing it in our lab.
Initially it was deployed with 5 nodes without issues. Then it was expanded
to 6 nodes, without issues again. After some time all storage space on
whole cluster was fully utilized and we had to remove all data from leveldb
dir, flush ring dir and rebuild cluster. This has been done by adding all 6
nodes at one time, so this may be the case. We can try to flush cluster
data once again and then add nodes one by one (committing cluster change
each time), waiting to partition transfer to end each time.

2017-05-24 15:44 GMT+03:00 Russell Brown :

> Hi,
>
> This is just a quick reply since this is somewhat a current  topic on the
> ML.
>
> On 24 May 2017, at 12:57, Denis Gudtsov  wrote:
>
> > Hello
> >
> > We have 6-nodes cluster with ring size 128 configured. The problem is
> that
> > two partitions has replicas only on two nodes rather than three as
> required
> > (n_val=3). We have tried several times to clean leveldb and ring
> directories
> > and then rebuild cluster, but this issue is still present.
>
> There was a fairly long discussion about this very issue recently (see
> http://lists.basho.com/pipermail/riak-users_lists.
> basho.com/2017-May/019281.html)
>
> I ran a little code and the following {RingSize, NodeCount, IsViolated}
> tuples were the result. If you built any of these clusters from scratch
> (i.e. you started NodeCount nodes, and used riak-admin cluster join,
> riak-admin cluster plan, riak-admin cluster commit to create a cluster of
> NodeCount from scratch) then you have tail violations in your ring.
>
> [{16,3,true},
>  {16,5,true},
>  {16,7,true},
>  {16,13,true},
>  {16,14,true},
>  {32,3,true},
>  {32,5,true},
>  {32,6,true},
>  {32,10,true},
>  {64,3,true},
>  {64,7,true},
>  {64,9,true},
>  {128,3,true},
>  {128,5,true},
>  {128,6,true},
>  {128,7,true},
>  {128,9,true},
>  {128,14,true},
>  {256,3,true},
>  {256,5,true},
>  {256,11,true},
>  {512,3,true},
>  {512,5,true},
>  {512,6,true},
>  {512,7,true},
>  {512,10,true}]
>
>
> > How can we diagnose where the issue is and fix it?
>
> WRT your problem, a quick experiment looks like adding 2 new nodes will
> solve your problem, just adding one doesn’t look like it does. I tried just
> adding one new node and still had a single violated preflist, but I have
> just thrown a little experiment together so I could well be wrong. It
> doesn’t actually build any clusters, and uses the claim code out of
> context, ymmv
>
> > Is there any way how we
> > can assign partition to node manually?
>
> I don’t know of a way, but that would be very useful.
>
> Do you remember if this cluster was built all at once as a 6-node cluster,
> or has it grown over time? Have you run the command riak-admin diag
> ring_preflists as documented here http://docs.basho.com/riak/kv/
> 2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?
>
> Sorry I can’t be more help
>
> Cheers
>
> Russell
>
> >
> > Please find output of member-status below and screen from riak control
> ring
> > status:
> > [root@riak01 ~]# riak-admin  member-status
> > = Membership
> > ==
> > Status RingPendingNode
> > 
> ---
> > valid  17.2%  --  'riak@riak01.
> > valid  17.2%  --  'riak@riak02.
> > valid  16.4%  --  'riak@riak03.
> > valid  16.4%  --  'riak@riak04.
> > valid  16.4%  --  'riak@riak05.
> > valid  16.4%  --  'riak@riak06.
> > 
> ---
> > Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> >
> > 
> >
> > Thank you.
> >
> >
> >
> > --
> > View this message in context: http://riak-users.197444.n3.
> nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> > Sent from the Riak Users mailing list archive at Nabble.com.
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Issues with partition distribution across nodes

2017-05-24 Thread Russell Brown

On 24 May 2017, at 15:44, Denis  wrote:

> Hi Russell
> 
> Thank you for your suggestions. I found diag is saying: "The following 
> preflists do not satisfy the n_val. Please add more nodes". It seems because 
> ring size (128) divided in 6 nodes is hard to arrange. 
> The history of our cluster is long story, cause we testing it in our lab. 
> Initially it was deployed with 5 nodes without issues. Then it was expanded 
> to 6 nodes, without issues again. After some time all storage space on whole 
> cluster was fully utilized and we had to remove all data from leveldb dir, 
> flush ring dir and rebuild cluster. This has been done by adding all 6 nodes 
> at one time, so this may be the case. We can try to flush cluster data once 
> again and then add nodes one by one (committing cluster change each time), 
> waiting to partition transfer to end each time.

Or add two more nodes in one go, might be quicker, and if time is money, 
cheaper.

> 
> 2017-05-24 15:44 GMT+03:00 Russell Brown :
> Hi,
> 
> This is just a quick reply since this is somewhat a current  topic on the ML.
> 
> On 24 May 2017, at 12:57, Denis Gudtsov  wrote:
> 
> > Hello
> >
> > We have 6-nodes cluster with ring size 128 configured. The problem is that
> > two partitions has replicas only on two nodes rather than three as required
> > (n_val=3). We have tried several times to clean leveldb and ring directories
> > and then rebuild cluster, but this issue is still present.
> 
> There was a fairly long discussion about this very issue recently (see 
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-May/019281.html)
> 
> I ran a little code and the following {RingSize, NodeCount, IsViolated} 
> tuples were the result. If you built any of these clusters from scratch (i.e. 
> you started NodeCount nodes, and used riak-admin cluster join, riak-admin 
> cluster plan, riak-admin cluster commit to create a cluster of NodeCount from 
> scratch) then you have tail violations in your ring.
> 
> [{16,3,true},
>  {16,5,true},
>  {16,7,true},
>  {16,13,true},
>  {16,14,true},
>  {32,3,true},
>  {32,5,true},
>  {32,6,true},
>  {32,10,true},
>  {64,3,true},
>  {64,7,true},
>  {64,9,true},
>  {128,3,true},
>  {128,5,true},
>  {128,6,true},
>  {128,7,true},
>  {128,9,true},
>  {128,14,true},
>  {256,3,true},
>  {256,5,true},
>  {256,11,true},
>  {512,3,true},
>  {512,5,true},
>  {512,6,true},
>  {512,7,true},
>  {512,10,true}]
> 
> 
> > How can we diagnose where the issue is and fix it?
> 
> WRT your problem, a quick experiment looks like adding 2 new nodes will solve 
> your problem, just adding one doesn’t look like it does. I tried just adding 
> one new node and still had a single violated preflist, but I have just thrown 
> a little experiment together so I could well be wrong. It doesn’t actually 
> build any clusters, and uses the claim code out of context, ymmv
> 
> > Is there any way how we
> > can assign partition to node manually?
> 
> I don’t know of a way, but that would be very useful.
> 
> Do you remember if this cluster was built all at once as a 6-node cluster, or 
> has it grown over time? Have you run the command riak-admin diag 
> ring_preflists as documented here 
> http://docs.basho.com/riak/kv/2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?
> 
> Sorry I can’t be more help
> 
> Cheers
> 
> Russell
> 
> >
> > Please find output of member-status below and screen from riak control ring
> > status:
> > [root@riak01 ~]# riak-admin  member-status
> > = Membership
> > ==
> > Status RingPendingNode
> > ---
> > valid  17.2%  --  'riak@riak01.
> > valid  17.2%  --  'riak@riak02.
> > valid  16.4%  --  'riak@riak03.
> > valid  16.4%  --  'riak@riak04.
> > valid  16.4%  --  'riak@riak05.
> > valid  16.4%  --  'riak@riak06.
> > ---
> > Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> >
> > 
> >
> > Thank you.
> >
> >
> >
> > --
> > View this message in context: 
> > http://riak-users.197444.n3.nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> > Sent from the Riak Users mailing list archive at Nabble.com.
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com