On Sun, Aug 1, 2010 at 11:40 AM, Alex Wolfe <a...@activeprospect.com> wrote:

> IIRC, that was a full paste of all the bitcask.write.locks.  Riak fails
> pretty much immediately while running my test suite, maybe before a lock is
> opened for each partition?
>

If that was a full paste, yes, you weren't even getting the whole system
spun up. You should have one .write.lock open per partition.


> My ulimit was set to 256, which is obviously no good.  After boosting it to
> 9000 and running my test suite, I have the locks shown below.  Riak is still
> running.  I guess that makes it an issue with max open files rather than a
> write lock issue?
>

Yup. But, leave it running long enough with enough traffic and the write
lock bug will eventually show up. :)

D.


>
> $ lsof -p 53113 | awk '{print $9}'| uniq -c  | grep lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1438665674247607560106752257205091097473808596992/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/22835963083295358096932575511191922182123945984/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/0/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/776422744832042175295707567380525354192214163456/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/730750818665451459101842416358141509827966271488/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/753586781748746817198774991869333432010090217472/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1164634117248063262943561351070788031288321245184/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1187470080331358621040493926581979953470445191168/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1210306043414653979137426502093171875652569137152/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/45671926166590716193865151022383844364247891968/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/91343852333181432387730302044767688728495783936/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/137015778499772148581595453067151533092743675904/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/114179815416476790484662877555959610910619729920/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/182687704666362864775460604089535377456991567872/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/228359630832953580969325755111919221821239459840/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/205523667749658222872393179600727299639115513856/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/639406966332270026714112114313373821099470487552/bitcask.write.lock
>    1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/616571003248974668617179538802181898917346541568/bitcask.write.lock
>
>
> On Jul 30, 2010, at 8:34 PM, David Smith wrote:
>
> That's only a partial paste, correct? How many partitions
> ({ring_creation_size, 64} in your etc/app.config) do you have defined? There
> should be a write lock file open for each partition. Also, what is your
> ulimit -n set to?
>
> Thanks,
>
> D.
>
> On Fri, Jul 30, 2010 at 5:09 PM, Alex Wolfe <a...@activeprospect.com>wrote:
>
>> $ lsof -p 16129 | awk '{print $9}'| uniq -c | grep lock
>>   1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/913438523331814323877303020447676887284957839360/bitcask.write.lock
>>   1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/959110449498405040071168171470060731649205731328/bitcask.write.lock
>>   1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/936274486415109681974235595958868809467081785344/bitcask.write.lock
>>   1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/411047335499316445744786359201454599278231027712/bitcask.write.lock
>>   1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/456719261665907161938651510223838443642478919680/bitcask.write.lock
>>   1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/433883298582611803841718934712646521460354973696/bitcask.write.lock
>>   1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/388211372416021087647853783690262677096107081728/bitcask.write.lock
>>
>>
>> On Jul 30, 2010, at 6:03 PM, David Smith wrote:
>>
>> > Yup, that looks like the file handle leak. You can verify by using
>> > lsof on the server and looking for multiple handles to
>> > bitcask.write.lock. Something like:
>> >
>> > lsof -p pid | awk '{print $9}'| uniq -c
>> >
>> > D.
>> >
>> > On Friday, July 30, 2010, Alex Wolfe <a...@activeprospect.com> wrote:
>> >> Hey David.
>> >> Does the below log output look like it could be caused by the issue you
>> fixed?
>> >> Alex
>> >>
>> >> ==== Fri Jul 30 14:22:34 CDT 2010
>> >> =ERROR REPORT==== 30-Jul-2010::14:22:34 ===** State machine <0.176.0>
>> terminating ** Last event in was {riak_vnode_req_v1,
>> 593735040165679310520246963290989976735222595584,
>> {fsm,undefined,<0.12466.0>},                     {riak_kv_put_req_v1,
>>                {<<"test.groups">>,<<"EghzXywWrGGtp2fCcSLoatIdjML">>},
>>                {r_object,<<"test.groups">>,
>> <<"EghzXywWrGGtp2fCcSLoatIdjML">>,                       [{r_content,
>>                   {dict,5,16,16,8,80,48,
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>  {{[],[],                            [[<<"Links">>]],
>>      [],[],[],[],[],[],[],
>>  [[<<"content-type">>,97,112,112,108,105,99,97,
>>  116,105,111,110,47,106,115,111,110],
>> [<<"X-Riak-VTag">>,89,69,78,55,55,111,66,121,73,
>>  69,78,53,122,101,85,105,117,68,89,80,52]],
>>  [],[],                            [[<<"X-Riak-Last-Modified">>|
>>                  {1280,517754,951062}]],                            [],
>>                        [[<<"X-Riak-Meta">>]]}}},
>> <<"{\"name\":\"foo\",\"created_at\":\"2010-07-30T19:22:34.947Z\",\"type\":\"group\",\"version\":1}">>}],
>>                       [{<<0,55,119,231>>,{1,63447736954}}],
>>       {dict,1,16,16,8,80,48,
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>  {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>  [[clean|true]],                          []}}},
>> undefined},                      33218311,63447736954,
>>  [{returnbody,true}]}}** When State == active**      Data  ==
>> {state,593735040165679310520246963290989976735222595584,
>>   riak_kv_vnode,
>> {state,593735040165679310520246963290989976735222595584,
>>           riak_kv_bitcask_backend,
>>  {#Ref<0.0.0.611>,
>> "data/bitcask/593735040165679310520246963290989976735222595584"},
>>                    [],false},                       undefined,none}** Reason
>> for termination = ** {{badmatch,{error,emfile}},
>> [{bitcask_fileops,create_file_loop,3},    {bitcask,put,3},
>>  {riak_kv_bitcask_backend,put,3},    {riak_kv_vnode,perform_put,3},
>>  {riak_kv_vnode,do_put,7},    {riak_kv_vnode,handle_command,3},
>>  {riak_core_vnode,vnode_command,3},    {gen_fsm,handle_msg,7}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:22:35 ===Failed to open lock file
>> data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock:
>> emfile
>> >> =ERROR REPORT==== 30-Jul-2010::14:22:35 ===** State machine <0.12471.0>
>> terminating ** Last event in was {riak_vnode_req_v1,
>> 593735040165679310520246963290989976735222595584,
>> {fsm,undefined,<0.12470.0>},                     {riak_kv_put_req_v1,
>>                {<<"test.users">>,<<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>},
>>              {r_object,<<"test.users">>,
>> <<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>,                       [{r_content,
>>                   {dict,5,16,16,8,80,48,
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>  {{[],[],                            [[<<"Links">>,
>>      {{<<"test.groups">>,
>>  <<"EghzXywWrGGtp2fCcSLoatIdjML">>},
>> <<"groups">>}]],                            [],[],[],[],[],[],[],
>>                  [[<<"content-type">>,97,112,112,108,105,99,97,
>>                  116,105,111,110,47,106,115,111,110],
>>       [<<"X-Riak-VTag">>,50,88,97,105,85,49,70,49,55,
>>        54,48,71,113,115,75,103,54,102,84,56,118,84]],
>>      [],[],
>>  [[<<"X-Riak-Last-Modified">>|{1280,517755,638}]],
>>  [],                            [[<<"X-Riak-Meta">>]]}}},
>>
>> <<"{\"name\":\"bar\",\"created_at\":\"2010-07-30T19:22:34.998Z\",\"type\":\"user\",\"version\":1}">>}],
>>                       [{<<3,30,180,15>>,{1,63447736955}}],
>>     {dict,1,16,16,8,80,48,
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>  {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>  [[clean|true]],                          []}}},
>> undefined},                      122945524,63447736955,
>>  [{returnbody,true}]}}** When State == active**      Data  ==
>> {state,593735040165679310520246963290989976735222595584,
>>   riak_kv_vnode,
>> {state,593735040165679310520246963290989976735222595584,
>>           riak_kv_bitcask_backend,
>>  {#Ref<0.0.0.16357>,
>> "data/bitcask/593735040165679310520246963290989976735222595584"},
>>                    [],false},                       undefined,none}** Reason
>> for termination = ** {bad_return_value,{error,{write_locked,emfile}}}
>> >> ==> sasl-error.log <==
>> >> =CRASH REPORT==== 30-Jul-2010::14:22:35 === crasher:   initial call:
>> riak_core_vnode:init/1   pid: <0.176.0>   registered_name: []   exception
>> exit: {{badmatch,{error,emfile}},
>>  [{bitcask_fileops,create_file_loop,3},                     {bitcask,put,3},
>>                     {riak_kv_bitcask_backend,put,3},
>> {riak_kv_vnode,perform_put,3},                     {riak_kv_vnode,do_put,7},
>>                     {riak_kv_vnode,handle_command,3},
>> {riak_core_vnode,vnode_command,3},
>> {gen_fsm,handle_msg,7}]}     in function  gen_fsm:terminate/7   ancestors:
>> [riak_core_vnode_sup,riak_core_sup,<0.98.0>]   messages: []   links:
>> [#Port<0.3507>,<0.100.0>]   dictionary: []   trap_exit: true   status:
>> running   heap_size: 1597   stack_size: 24   reductions: 11185 neighbours:
>> >> =SUPERVISOR REPORT==== 30-Jul-2010::14:22:35 ===    Supervisor:
>> {local,riak_core_vnode_sup}    Context:    child_terminated    Reason:
>> {{badmatch,{error,emfile}},
>> [{bitcask_fileops,create_file_loop,3},                  {bitcask,put,3},
>>              {riak_kv_bitcask_backend,put,3},
>>  {riak_kv_vnode,perform_put,3},                  {riak_kv_vnode,do_put,7},
>>                {riak_kv_vnode,handle_command,3},
>>  {riak_core_vnode,vnode_command,3},
>>  {gen_fsm,handle_msg,7}]}    Offender:   [{pid,<0.176.0>},
>> {name,undefined},                 {mfa,
>> {riak_core_vnode,start_link,                         [riak_kv_vnode,
>>                  593735040165679310520246963290989976735222595584]}},
>>           {restart_type,temporary},                 {shutdown,brutal_kill},
>>                 {child_type,worker}]
>> >>
>> >> =CRASH REPORT==== 30-Jul-2010::14:22:35 === crasher:   initial call:
>> riak_core_vnode:init/1   pid: <0.12471.0>   registered_name: []   exception
>> exit: {bad_return_value,{error,{write_locked,emfile}}}     in function
>>  gen_fsm:terminate/7   ancestors:
>> [riak_core_vnode_sup,riak_core_sup,<0.98.0>]   messages: []   links:
>> [<0.100.0>]   dictionary: []   trap_exit: true   status: running
>> heap_size: 2584   stack_size: 24   reductions: 1955 neighbours:
>> >> =SUPERVISOR REPORT==== 30-Jul-2010::14:22:35 ===    Supervisor:
>> {local,riak_core_vnode_sup}    Context:    child_terminated    Reason:
>> {bad_return_value,{error,{write_locked,emfile}}}    Offender:
>> [{pid,<0.12471.0>},                 {name,undefined},                 {mfa,
>>                     {riak_core_vnode,start_link,
>> [riak_kv_vnode,
>>  593735040165679310520246963290989976735222595584]}},
>> {restart_type,temporary},                 {shutdown,brutal_kill},
>>       {child_type,worker}]
>> >>
>> >> ==> erlang.log.4 <==
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21020.78>,
>>      {emfile,                              [{erlang,open_port,
>>                     [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>>                [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>>                            false,
>>  {1897500000,7776508000},                              {<0.81.0>,972008},
>>                            false,60000,30000,0.8,0.05,<0.21020.78>,
>>                      #Ref<0.0.2.58535>,undefined,
>>    [reg],                              []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>>         {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21186.78>,
>>      {emfile,                              [{erlang,open_port,
>>                     [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>>                [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>>                            false,undefined,undefined,false,60000,30000,
>>                          0.8,0.05,<0.21186.78>,#Ref<0.0.2.58551>,
>>                    undefined,                              [reg],
>>                    []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>>         {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21252.78>,
>>      {emfile,                              [{erlang,open_port,
>>                     [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>>                [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>>                            false,undefined,undefined,false,60000,30000,
>>                          0.8,0.05,<0.21252.78>,#Ref<0.0.2.58559>,
>>                    undefined,                              [reg],
>>                    []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>>         {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21374.78>,
>>      {emfile,                              [{erlang,open_port,
>>                     [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>>                [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>>                            false,undefined,undefined,false,60000,30000,
>>                          0.8,0.05,<0.21374.78>,#Ref<0.0.2.58569>,
>>                    undefined,                              [reg],
>>                    []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>>         {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21396.78>,
>>      {emfile,                              [{erlang,open_port,
>>                     [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>>                [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>>                            false,undefined,undefined,false,60000,30000,
>>                          0.8,0.05,<0.21396.78>,#Ref<0.0.2.58575>,
>>                    undefined,                              [reg],
>>                    []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>>         {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** State machine <0.175.0>
>> terminating ** Last event in was {riak_vnode_req_v1,
>> 570899077082383952423314387779798054553098649600,
>> {fsm,undefined,<0.12470.0>},                     {riak_kv_put_req_v1,
>>                {<<"test.users">>,<<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>},
>>              {r_object,<<"test.users">>,
>> <<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>,                       [{r_content,
>>                   {dict,5,16,16,8,80,48,
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>  {{[],[],                            [[<<"Links">>,
>>      {{<<"test.groups">>,
>>  <<"EghzXywWrGGtp2fCcSLoatIdjML">>},
>> <<"groups">>}]],                            [],[],[],[],[],[],[],
>>                  [[<<"content-type">>,97,112,112,108,105,99,97,
>>                  116,105,111,110,47,106,115,111,110],
>>       [<<"X-Riak-VTag">>,50,88,97,105,85,49,70,49,55,
>>        54,48,71,113,115,75,103,54,102,84,56,118,84]],
>>      [],[],
>>  [[<<"X-Riak-Last-Modified">>|{1280,517755,638}]],
>>  [],                            [[<<"X-Riak-Meta">>]]}}},
>>
>> <<"{\"name\":\"bar\",\"created_at\":\"2010-07-30T19:22:34.998Z\",\"type\":\"user\",\"version\":1}">>}],
>>                       [{<<3,30,180,15>>,{1,63447736955}}],
>>     {dict,1,16,16,8,80,48,
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>  {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>  [[clean|true]],                          []}}},
>> undefined},                      122945524,63447736955,
>>  [{returnbody,true}]}}** When State == active**      Data  ==
>> {state,570899077082383952423314387779798054553098649600,
>>   riak_kv_vnode,
>> {state,570899077082383952423314387779798054553098649600,
>>
>>
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to