Undiagnosed High FSM Time

2016-01-26 Thread Alex Wolfe
We have a 5 node Riak cluster running 2.1.1. This morning FSM Time (99th 
percentile) went way up. We couldn't find any clear signs of trouble with the 
cluster and ultimately chose to move the data files and restart the nodes. Once 
we started with an empty DB, the FSM Time normalized. But now it's headed back 
up again. We're stumped on how to trouble shoot this issue. Any suggestions?
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Undiagnosed High FSM Time

2016-01-26 Thread Alex Wolfe
Thanks for your reply.

We are. We sort of expected an anomaly in the object size, but there was none. 
We found the root cause. It was a large number of additions to a single set. 
It’s not clear to me which metric reveals that problem, but it appears as 
though object size doesn’t.

Alex


> On Jan 26, 2016, at 3:40 PM, Luke Bakken  wrote:
> 
> Hi Alex -
> 
> Are you monitoring any of Riak's statistics? Specifically object size
> and sibling count, though all of the stats are useful.
> 
> --
> Luke Bakken
> Engineer
> lbak...@basho.com
> 
> On Tue, Jan 26, 2016 at 11:40 AM, Alex Wolfe  wrote:
>> We have a 5 node Riak cluster running 2.1.1. This morning FSM Time (99th 
>> percentile) went way up. We couldn't find any clear signs of trouble with 
>> the cluster and ultimately chose to move the data files and restart the 
>> nodes. Once we started with an empty DB, the FSM Time normalized. But now 
>> it's headed back up again. We're stumped on how to trouble shoot this issue. 
>> Any suggestions?
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Write_lock error has occurred after inserting 12M data

2010-07-30 Thread Alex Wolfe
$ lsof -p 16129 | awk '{print $9}'| uniq -c | grep lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/913438523331814323877303020447676887284957839360/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/959110449498405040071168171470060731649205731328/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/936274486415109681974235595958868809467081785344/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/411047335499316445744786359201454599278231027712/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/456719261665907161938651510223838443642478919680/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/433883298582611803841718934712646521460354973696/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/388211372416021087647853783690262677096107081728/bitcask.write.lock


On Jul 30, 2010, at 6:03 PM, David Smith wrote:

> Yup, that looks like the file handle leak. You can verify by using
> lsof on the server and looking for multiple handles to
> bitcask.write.lock. Something like:
> 
> lsof -p pid | awk '{print $9}'| uniq -c
> 
> D.
> 
> On Friday, July 30, 2010, Alex Wolfe  wrote:
>> Hey David.
>> Does the below log output look like it could be caused by the issue you 
>> fixed?
>> Alex
>> 
>>  Fri Jul 30 14:22:34 CDT 2010
>> =ERROR REPORT 30-Jul-2010::14:22:34 ===** State machine <0.176.0> 
>> terminating ** Last event in was {riak_vnode_req_v1, 
>> 593735040165679310520246963290989976735222595584, 
>> {fsm,undefined,<0.12466.0>}, {riak_kv_put_req_v1,
>>   {<<"test.groups">>,<<"EghzXywWrGGtp2fCcSLoatIdjML">>}, 
>>  {r_object,<<"test.groups">>,   
>> <<"EghzXywWrGGtp2fCcSLoatIdjML">>,   [{r_content,
>>  {dict,5,16,16,8,80,48,  
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},  
>> {{[],[],[[<<"Links">>]], 
>>[],[],[],[],[],[],[],
>> [[<<"content-type">>,97,112,112,108,105,99,97,  
>> 116,105,111,110,47,106,115,111,110], 
>> [<<"X-Riak-VTag">>,89,69,78,55,55,111,66,121,73, 
>>  69,78,53,122,101,85,105,117,68,89,80,52]],
>> [],[],[[<<"X-Riak-Last-Modified">>|  
>> {1280,517754,951062}]],[],   
>>  [[<<"X-Riak-Meta">>]]}}}, 
>> <<"{\"name\":\"foo\",\"created_at\":\"2010-07-30T19:22:34.947Z\",\"type\":\"group\",\"version\":1}">>}],
>>[{<<0,55,119,231>>,{1,63447736954}}], 
>>   {dict,1,16,16,8,80,48,
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],  
>> [[clean|true]],  []}}},   
>> undefined},  33218311,63447736954,  
>> [{returnbody,true}]}}** When State == active**  Data  == 
>> {state,593735040165679310520246963290989976735222595584, 
>>   riak_kv_vnode,   
>> {state,593735040165679310520246963290989976735222595584, 
>>   riak_kv_bitcask_backend,  
>> {#Ref<0.0.0.611>,   
>> "data/bitcask/593735040165679310520246963290989976735222595584"},
>>   [],false},   undefined,none}** Reason 
>> for termination = ** {{badmatch,{error,emfile}},   
>> [{bitcask_fileops,create_file_loop,3},{bitcask,put,3},
>> {riak_kv_bitcask_backend,put,3},{riak_kv_vnode,perform_put,3},
>> {riak_kv_vnode,do_put,7},{riak_kv_vnode,handle_command,3},
>> {riak_core_vnode,vnode_command,3},{gen_fsm,handle_msg,7}]}
>> =ERROR REPORT 30-Jul-2010::14:22:35 ===Failed to open lock file 
>> data/bitcask/5937

Re: Write_lock error has occurred after inserting 12M data

2010-08-01 Thread Alex Wolfe
IIRC, that was a full paste of all the bitcask.write.locks.  Riak fails pretty 
much immediately while running my test suite, maybe before a lock is opened for 
each partition?  

My ulimit was set to 256, which is obviously no good.  After boosting it to 
9000 and running my test suite, I have the locks shown below.  Riak is still 
running.  I guess that makes it an issue with max open files rather than a 
write lock issue?

$ lsof -p 53113 | awk '{print $9}'| uniq -c  | grep lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1438665674247607560106752257205091097473808596992/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/22835963083295358096932575511191922182123945984/bitcask.write.lock
   1 /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/0/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/776422744832042175295707567380525354192214163456/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/730750818665451459101842416358141509827966271488/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/753586781748746817198774991869333432010090217472/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1164634117248063262943561351070788031288321245184/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1187470080331358621040493926581979953470445191168/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1210306043414653979137426502093171875652569137152/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/45671926166590716193865151022383844364247891968/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/91343852333181432387730302044767688728495783936/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/137015778499772148581595453067151533092743675904/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/114179815416476790484662877555959610910619729920/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/182687704666362864775460604089535377456991567872/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/228359630832953580969325755111919221821239459840/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/205523667749658222872393179600727299639115513856/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/639406966332270026714112114313373821099470487552/bitcask.write.lock
   1 
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/616571003248974668617179538802181898917346541568/bitcask.write.lock


On Jul 30, 2010, at 8:34 PM, David Smith wrote:

> That's only a partial paste, correct? How many partitions 
> ({ring_creation_size, 64} in your etc/app.config) do you have defined? There 
> should be a write lock file open for each partition. Also, what is your 
> ulimit -n set to?
> 
> Thanks,
> 
> D.
> 
> On Fri, Jul 30, 2010 at 5:09 PM, Alex Wolfe  wrote:
> $ lsof -p 16129 | awk '{print $9}'| uniq -c | grep lock
>   1 
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/913438523331814323877303020447676887284957839360/bitcask.write.lock
>   1 
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/959110449498405040071168171470060731649205731328/bitcask.write.lock
>   1 
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/936274486415109681974235595958868809467081785344/bitcask.write.lock
>   1 
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/411047335499316445744786359201454599278231027712/bitcask.write.lock
>   1 
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/456719261665907161938651510223838443642478919680/bitcask.write.lock
>   1 
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/433883298582611803841718934712646521460354973696/bitcask.write.lock
>   1 
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/388211372416021087647853783690262677096107081728/bitcask.write.lock
> 
> 
> On Jul 30, 2010, at 6:03 PM, David Smith wrote:
> 
> > Yup, that looks like the file handle leak. You can verify by using
> > lsof on the server and looking for multiple handles to
> > bitcask.write.lock. Something like:
> >
> > lsof -p pid | awk '{print $9}'| uniq -c
> >
> > D.
> >
> > On Friday, July 30, 2010, Alex Wolfe  wrote:
> >> Hey David.
> >> Does the below log output look like it could be caused by the issue you 
> >> fixed?
> >> Alex
> >>
> >>  Fri Jul 30 14:22:34 CDT 2010
> >> =ERROR REPORT 30-Jul-2010::14:22:34 ===** State machine <0.176.0> 
> >> terminating *

Re: Pervasive replication

2010-10-05 Thread Alex Wolfe
I've run into a problem with Riak on my development machine, and I can't quite 
sort out what's happening.  I've tried stopping the riak processes and 
restarting it back up again, but it will not service any requests.

Has anyone seen this before?


$ curl -v -X POST http://riak:8098/riak/test -d'{"foo":"bar"}' -H 
'Content-Type:application/json'
* About to connect() to riak port 8098 (#0)
*   Trying ::1... Connection refused
*   Trying fe80::1... Connection refused
*   Trying 127.0.0.1... connected
* Connected to riak (127.0.0.1) port 8098 (#0)
> POST /riak/test HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 
> OpenSSL/0.9.8l zlib/1.2.3
> Host: riak:8098
> Accept: */*
> Content-Type:application/json
> Content-Length: 13
> 
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Server: MochiWeb/1.1 WebMachine/1.7.1 (participate in the frantic)
< Location: /riak/test/RmfYZCM8LtBPRu4gqZivu8pfVoh
< Date: Tue, 05 Oct 2010 16:36:57 GMT
< Content-Type: text/html
< Content-Length: 713
< 
500 Internal Server ErrorInternal 
Server ErrorThe server encountered an error while processing this 
request:{error,{error,{case_clause,{error,timeout}},
  [{riak_kv_wm_raw,accept_doc_body,2},
   {webmachine_resource,resource_call,3},
   {webmachine_resource,do,3},
   {webmachine_decision_core,resource_call,1},
   {webmachine_decision_core,accept_helper,0},
   {webmachine_decision_core,decision,1},
   {webmachine_decision_core,handle_request,2},
* Connection #0 to host riak left intact
* Closing connection #0
   
{webmachine_mochiweb,loop,1}]}}mochiweb+webmachine web 
server



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com