On Sun, Aug 1, 2010 at 11:40 AM, Alex Wolfe <a...@activeprospect.com> wrote:
> IIRC, that was a full paste of all the bitcask.write.locks. Riak fails > pretty much immediately while running my test suite, maybe before a lock is > opened for each partition? > If that was a full paste, yes, you weren't even getting the whole system spun up. You should have one .write.lock open per partition. > My ulimit was set to 256, which is obviously no good. After boosting it to > 9000 and running my test suite, I have the locks shown below. Riak is still > running. I guess that makes it an issue with max open files rather than a > write lock issue? > Yup. But, leave it running long enough with enough traffic and the write lock bug will eventually show up. :) D. > > $ lsof -p 53113 | awk '{print $9}'| uniq -c | grep lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1438665674247607560106752257205091097473808596992/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/22835963083295358096932575511191922182123945984/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/0/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/776422744832042175295707567380525354192214163456/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/730750818665451459101842416358141509827966271488/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/753586781748746817198774991869333432010090217472/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1164634117248063262943561351070788031288321245184/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1187470080331358621040493926581979953470445191168/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1210306043414653979137426502093171875652569137152/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/45671926166590716193865151022383844364247891968/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/91343852333181432387730302044767688728495783936/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/137015778499772148581595453067151533092743675904/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/114179815416476790484662877555959610910619729920/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/182687704666362864775460604089535377456991567872/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/228359630832953580969325755111919221821239459840/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/205523667749658222872393179600727299639115513856/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/639406966332270026714112114313373821099470487552/bitcask.write.lock > 1 > /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/616571003248974668617179538802181898917346541568/bitcask.write.lock > > > On Jul 30, 2010, at 8:34 PM, David Smith wrote: > > That's only a partial paste, correct? How many partitions > ({ring_creation_size, 64} in your etc/app.config) do you have defined? There > should be a write lock file open for each partition. Also, what is your > ulimit -n set to? > > Thanks, > > D. > > On Fri, Jul 30, 2010 at 5:09 PM, Alex Wolfe <a...@activeprospect.com>wrote: > >> $ lsof -p 16129 | awk '{print $9}'| uniq -c | grep lock >> 1 >> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/913438523331814323877303020447676887284957839360/bitcask.write.lock >> 1 >> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/959110449498405040071168171470060731649205731328/bitcask.write.lock >> 1 >> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/936274486415109681974235595958868809467081785344/bitcask.write.lock >> 1 >> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/411047335499316445744786359201454599278231027712/bitcask.write.lock >> 1 >> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/456719261665907161938651510223838443642478919680/bitcask.write.lock >> 1 >> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/433883298582611803841718934712646521460354973696/bitcask.write.lock >> 1 >> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/388211372416021087647853783690262677096107081728/bitcask.write.lock >> >> >> On Jul 30, 2010, at 6:03 PM, David Smith wrote: >> >> > Yup, that looks like the file handle leak. You can verify by using >> > lsof on the server and looking for multiple handles to >> > bitcask.write.lock. Something like: >> > >> > lsof -p pid | awk '{print $9}'| uniq -c >> > >> > D. >> > >> > On Friday, July 30, 2010, Alex Wolfe <a...@activeprospect.com> wrote: >> >> Hey David. >> >> Does the below log output look like it could be caused by the issue you >> fixed? >> >> Alex >> >> >> >> ==== Fri Jul 30 14:22:34 CDT 2010 >> >> =ERROR REPORT==== 30-Jul-2010::14:22:34 ===** State machine <0.176.0> >> terminating ** Last event in was {riak_vnode_req_v1, >> 593735040165679310520246963290989976735222595584, >> {fsm,undefined,<0.12466.0>}, {riak_kv_put_req_v1, >> {<<"test.groups">>,<<"EghzXywWrGGtp2fCcSLoatIdjML">>}, >> {r_object,<<"test.groups">>, >> <<"EghzXywWrGGtp2fCcSLoatIdjML">>, [{r_content, >> {dict,5,16,16,8,80,48, >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, >> {{[],[], [[<<"Links">>]], >> [],[],[],[],[],[],[], >> [[<<"content-type">>,97,112,112,108,105,99,97, >> 116,105,111,110,47,106,115,111,110], >> [<<"X-Riak-VTag">>,89,69,78,55,55,111,66,121,73, >> 69,78,53,122,101,85,105,117,68,89,80,52]], >> [],[], [[<<"X-Riak-Last-Modified">>| >> {1280,517754,951062}]], [], >> [[<<"X-Riak-Meta">>]]}}}, >> <<"{\"name\":\"foo\",\"created_at\":\"2010-07-30T19:22:34.947Z\",\"type\":\"group\",\"version\":1}">>}], >> [{<<0,55,119,231>>,{1,63447736954}}], >> {dict,1,16,16,8,80,48, >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, >> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], >> [[clean|true]], []}}}, >> undefined}, 33218311,63447736954, >> [{returnbody,true}]}}** When State == active** Data == >> {state,593735040165679310520246963290989976735222595584, >> riak_kv_vnode, >> {state,593735040165679310520246963290989976735222595584, >> riak_kv_bitcask_backend, >> {#Ref<0.0.0.611>, >> "data/bitcask/593735040165679310520246963290989976735222595584"}, >> [],false}, undefined,none}** Reason >> for termination = ** {{badmatch,{error,emfile}}, >> [{bitcask_fileops,create_file_loop,3}, {bitcask,put,3}, >> {riak_kv_bitcask_backend,put,3}, {riak_kv_vnode,perform_put,3}, >> {riak_kv_vnode,do_put,7}, {riak_kv_vnode,handle_command,3}, >> {riak_core_vnode,vnode_command,3}, {gen_fsm,handle_msg,7}]} >> >> =ERROR REPORT==== 30-Jul-2010::14:22:35 ===Failed to open lock file >> data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock: >> emfile >> >> =ERROR REPORT==== 30-Jul-2010::14:22:35 ===** State machine <0.12471.0> >> terminating ** Last event in was {riak_vnode_req_v1, >> 593735040165679310520246963290989976735222595584, >> {fsm,undefined,<0.12470.0>}, {riak_kv_put_req_v1, >> {<<"test.users">>,<<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>}, >> {r_object,<<"test.users">>, >> <<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>, [{r_content, >> {dict,5,16,16,8,80,48, >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, >> {{[],[], [[<<"Links">>, >> {{<<"test.groups">>, >> <<"EghzXywWrGGtp2fCcSLoatIdjML">>}, >> <<"groups">>}]], [],[],[],[],[],[],[], >> [[<<"content-type">>,97,112,112,108,105,99,97, >> 116,105,111,110,47,106,115,111,110], >> [<<"X-Riak-VTag">>,50,88,97,105,85,49,70,49,55, >> 54,48,71,113,115,75,103,54,102,84,56,118,84]], >> [],[], >> [[<<"X-Riak-Last-Modified">>|{1280,517755,638}]], >> [], [[<<"X-Riak-Meta">>]]}}}, >> >> <<"{\"name\":\"bar\",\"created_at\":\"2010-07-30T19:22:34.998Z\",\"type\":\"user\",\"version\":1}">>}], >> [{<<3,30,180,15>>,{1,63447736955}}], >> {dict,1,16,16,8,80,48, >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, >> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], >> [[clean|true]], []}}}, >> undefined}, 122945524,63447736955, >> [{returnbody,true}]}}** When State == active** Data == >> {state,593735040165679310520246963290989976735222595584, >> riak_kv_vnode, >> {state,593735040165679310520246963290989976735222595584, >> riak_kv_bitcask_backend, >> {#Ref<0.0.0.16357>, >> "data/bitcask/593735040165679310520246963290989976735222595584"}, >> [],false}, undefined,none}** Reason >> for termination = ** {bad_return_value,{error,{write_locked,emfile}}} >> >> ==> sasl-error.log <== >> >> =CRASH REPORT==== 30-Jul-2010::14:22:35 === crasher: initial call: >> riak_core_vnode:init/1 pid: <0.176.0> registered_name: [] exception >> exit: {{badmatch,{error,emfile}}, >> [{bitcask_fileops,create_file_loop,3}, {bitcask,put,3}, >> {riak_kv_bitcask_backend,put,3}, >> {riak_kv_vnode,perform_put,3}, {riak_kv_vnode,do_put,7}, >> {riak_kv_vnode,handle_command,3}, >> {riak_core_vnode,vnode_command,3}, >> {gen_fsm,handle_msg,7}]} in function gen_fsm:terminate/7 ancestors: >> [riak_core_vnode_sup,riak_core_sup,<0.98.0>] messages: [] links: >> [#Port<0.3507>,<0.100.0>] dictionary: [] trap_exit: true status: >> running heap_size: 1597 stack_size: 24 reductions: 11185 neighbours: >> >> =SUPERVISOR REPORT==== 30-Jul-2010::14:22:35 === Supervisor: >> {local,riak_core_vnode_sup} Context: child_terminated Reason: >> {{badmatch,{error,emfile}}, >> [{bitcask_fileops,create_file_loop,3}, {bitcask,put,3}, >> {riak_kv_bitcask_backend,put,3}, >> {riak_kv_vnode,perform_put,3}, {riak_kv_vnode,do_put,7}, >> {riak_kv_vnode,handle_command,3}, >> {riak_core_vnode,vnode_command,3}, >> {gen_fsm,handle_msg,7}]} Offender: [{pid,<0.176.0>}, >> {name,undefined}, {mfa, >> {riak_core_vnode,start_link, [riak_kv_vnode, >> 593735040165679310520246963290989976735222595584]}}, >> {restart_type,temporary}, {shutdown,brutal_kill}, >> {child_type,worker}] >> >> >> >> =CRASH REPORT==== 30-Jul-2010::14:22:35 === crasher: initial call: >> riak_core_vnode:init/1 pid: <0.12471.0> registered_name: [] exception >> exit: {bad_return_value,{error,{write_locked,emfile}}} in function >> gen_fsm:terminate/7 ancestors: >> [riak_core_vnode_sup,riak_core_sup,<0.98.0>] messages: [] links: >> [<0.100.0>] dictionary: [] trap_exit: true status: running >> heap_size: 2584 stack_size: 24 reductions: 1955 neighbours: >> >> =SUPERVISOR REPORT==== 30-Jul-2010::14:22:35 === Supervisor: >> {local,riak_core_vnode_sup} Context: child_terminated Reason: >> {bad_return_value,{error,{write_locked,emfile}}} Offender: >> [{pid,<0.12471.0>}, {name,undefined}, {mfa, >> {riak_core_vnode,start_link, >> [riak_kv_vnode, >> 593735040165679310520246963290989976735222595584]}}, >> {restart_type,temporary}, {shutdown,brutal_kill}, >> {child_type,worker}] >> >> >> >> ==> erlang.log.4 <== >> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup >> terminating ** Last message in was {'EXIT',<0.21020.78>, >> {emfile, [{erlang,open_port, >> [{spawn,"/bin/sh -s unix:cmd 2>&1"}, >> [stream]]}, >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin}, >> false, >> {1897500000,7776508000}, {<0.81.0>,972008}, >> false,60000,30000,0.8,0.05,<0.21020.78>, >> #Ref<0.0.2.58535>,undefined, >> [reg], []}** Reason for termination == ** >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]}, >> {os,start_port_srv_loop,2}]} >> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup >> terminating ** Last message in was {'EXIT',<0.21186.78>, >> {emfile, [{erlang,open_port, >> [{spawn,"/bin/sh -s unix:cmd 2>&1"}, >> [stream]]}, >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin}, >> false,undefined,undefined,false,60000,30000, >> 0.8,0.05,<0.21186.78>,#Ref<0.0.2.58551>, >> undefined, [reg], >> []}** Reason for termination == ** >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]}, >> {os,start_port_srv_loop,2}]} >> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup >> terminating ** Last message in was {'EXIT',<0.21252.78>, >> {emfile, [{erlang,open_port, >> [{spawn,"/bin/sh -s unix:cmd 2>&1"}, >> [stream]]}, >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin}, >> false,undefined,undefined,false,60000,30000, >> 0.8,0.05,<0.21252.78>,#Ref<0.0.2.58559>, >> undefined, [reg], >> []}** Reason for termination == ** >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]}, >> {os,start_port_srv_loop,2}]} >> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup >> terminating ** Last message in was {'EXIT',<0.21374.78>, >> {emfile, [{erlang,open_port, >> [{spawn,"/bin/sh -s unix:cmd 2>&1"}, >> [stream]]}, >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin}, >> false,undefined,undefined,false,60000,30000, >> 0.8,0.05,<0.21374.78>,#Ref<0.0.2.58569>, >> undefined, [reg], >> []}** Reason for termination == ** >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]}, >> {os,start_port_srv_loop,2}]} >> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup >> terminating ** Last message in was {'EXIT',<0.21396.78>, >> {emfile, [{erlang,open_port, >> [{spawn,"/bin/sh -s unix:cmd 2>&1"}, >> [stream]]}, >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin}, >> false,undefined,undefined,false,60000,30000, >> 0.8,0.05,<0.21396.78>,#Ref<0.0.2.58575>, >> undefined, [reg], >> []}** Reason for termination == ** >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]}, >> {os,start_port_srv_loop,2}]} >> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** State machine <0.175.0> >> terminating ** Last event in was {riak_vnode_req_v1, >> 570899077082383952423314387779798054553098649600, >> {fsm,undefined,<0.12470.0>}, {riak_kv_put_req_v1, >> {<<"test.users">>,<<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>}, >> {r_object,<<"test.users">>, >> <<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>, [{r_content, >> {dict,5,16,16,8,80,48, >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, >> {{[],[], [[<<"Links">>, >> {{<<"test.groups">>, >> <<"EghzXywWrGGtp2fCcSLoatIdjML">>}, >> <<"groups">>}]], [],[],[],[],[],[],[], >> [[<<"content-type">>,97,112,112,108,105,99,97, >> 116,105,111,110,47,106,115,111,110], >> [<<"X-Riak-VTag">>,50,88,97,105,85,49,70,49,55, >> 54,48,71,113,115,75,103,54,102,84,56,118,84]], >> [],[], >> [[<<"X-Riak-Last-Modified">>|{1280,517755,638}]], >> [], [[<<"X-Riak-Meta">>]]}}}, >> >> <<"{\"name\":\"bar\",\"created_at\":\"2010-07-30T19:22:34.998Z\",\"type\":\"user\",\"version\":1}">>}], >> [{<<3,30,180,15>>,{1,63447736955}}], >> {dict,1,16,16,8,80,48, >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, >> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], >> [[clean|true]], []}}}, >> undefined}, 122945524,63447736955, >> [{returnbody,true}]}}** When State == active** Data == >> {state,570899077082383952423314387779798054553098649600, >> riak_kv_vnode, >> {state,570899077082383952423314387779798054553098649600, >> >> > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com