Hi Shaun, Im having this issue again, this time I have captured the system limits, while riak is still crashing.
Please note lsof and prlimit outputs at bottom. steven@hawk5:log/riak:» tail error.log [0] 07:17:05 2017-01-31 19:21:37.391 [error] emulator Error in process <0.7964.15> on node 'r...@hawk5.streethawk.com' with exit value: {{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} 2017-01-31 19:21:40.868 [error] <0.25635.14> gen_server yz_cover terminated with reason: no match of right hand value error in mochiglobal:compile/2 line 51 2017-01-31 19:21:40.868 [error] <0.25635.14> CRASH REPORT Process yz_cover with 0 neighbours exited with reason: no match of right hand value error in mochiglobal:compile/2 line 51 in gen_server:terminate/6 line 744 2017-01-31 19:21:40.868 [error] <0.1215.0> Supervisor yz_general_sup had child yz_cover started with yz_cover:start_link() at <0.25635.14> exit with reason no match of right hand value error in mochiglobal:compile/2 line 51 in context child_terminated 2017-01-31 19:21:41.811 [error] emulator Error in process <0.18111.15> on node 'r...@hawk5.streethawk.com' with exit value: {{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} 2017-01-31 19:21:47.363 [error] emulator Error in process <0.2866.15> on node 'r...@hawk5.streethawk.com' with exit value: {{badmatch,{error,system_limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} steven@hawk5:log/riak:» sudo lsof -a -p `riak getpid` |wc -l [0] 07:17:10 48446 steven@hawk5:log/riak:» sudo prlimit -n --noheadings -o soft -p `riak getpid` [0] 07:17:27 20000500 steven@hawk5:log/riak:» sudo prlimit -n --noheadings -o hard -p `riak getpid` [0] 07:17:32 20000500 steven@hawk5:log/riak:» Python trace: 2017-01-31T20:20:52.004Z hawk4| return self._client.fulltext_search(search_index, query, **params) 2017-01-31T20:20:52.004Z hawk4| **skwargs 2017-01-31T20:20:52.004Z hawk4| return self._with_retries(pool, thunk) 2017-01-31T20:20:52.004Z hawk4| **kwargs 2017-01-31T20:20:52.004Z hawk4| File "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py", line 179, in wrapper 2017-01-31T20:20:52.004Z hawk4| File "/usr/local/lib/python2.7/dist-packages/riak/bucket.py", line 476, in search 2017-01-31T20:20:52.004Z hawk4| File "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py", line 134, in _with_retries 2017-01-31T20:20:52.004Z hawk4| File "/opt/streethawk/cloud/core/riakdb/models.py", line 528, in search 2017-01-31T20:20:52.005Z hawk4| RiakError: 'recv_into returned zero bytes unexpectedly' 2017-01-31T20:20:52.005Z hawk4| raise e.args[0] Regards Steven Shaun McVey <smc...@basho.com> writes: > Hi Steven, > > Based on that log output, it looks like you're running into issues with > system limits, probably open file limits. You can check the value that > Riak has available by connecting to one of the nodes with riak attach, then > executing: > > ``` > os:cmd("ulimit -n"). > ``` > > (After, disconnect with ctrl+g, then q, then Enter). > > It should be at least 65,536 ideally, although the bigger the better. > > If you find it's lower, then follow this doc to increase it. > > http://docs.basho.com/riak/kv/2.0.2/using/performance/open-files-limit/ > > Have a check and let us know what the output was. > > Kind Regards, > Shaun > > On Thu, Jan 26, 2017 at 10:34 AM, Steven Joseph <ste...@streethawk.com> > wrote: > >> Hi, >> >> We have a cluster of 5 nodes, which are continuously being queried for >> new data through solr. We have been having some issues with riak/solr >> which seems to be happening after longer periods of operation. It starts >> off with one node and it seems to be happening on all node after a >> while. >> >> We tried upgrading to the latest version of riak hoping that it would >> solve the issue, but no luck. >> >> Only thing that stops the crashes is a full cluster staggered restart. >> >> Please find the logs below. Any help would be much appreciated. >> >> Riak Logs: >> >> 2017-01-26T07:53:03.262Z hawk5| ** Last message in was tick >> 2017-01-26T07:53:10.197Z hawk5| >> 2017-01-26T07:53:10.197Z hawk5| 2017-01-26 07:53:08.183 [error] emulator >> Error in process <0.22701.73> on node 'r...@hawk5.streethawk.com' with >> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g >> et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223} >> ]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} >> 2017-01-26T07:53:10.263Z hawk5| Error in process <0.22701.73> on node ' >> r...@hawk5.streethawk.com' with exit value: {{badmatch,{error,system_ >> limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e >> rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{ >> file,"cpu_sup.erl"},{line,585}]}]} >> 2017-01-26T07:53:10.263Z hawk5| 2017-01-26 07:53:08 =ERROR REPORT==== >> 2017-01-26T07:53:17.198Z hawk5| >> 2017-01-26T07:53:17.208Z hawk5| 2017-01-26 07:53:13.472 [error] emulator >> Error in process <0.12549.73> on node 'r...@hawk5.streethawk.com' with >> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g >> et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223} >> ]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} >> 2017-01-26T07:53:17.263Z hawk5| Error in process <0.12549.73> on node ' >> r...@hawk5.streethawk.com' with exit value: {{badmatch,{error,system_ >> limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e >> rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{ >> file,"cpu_sup.erl"},{line,585}]}]} >> 2017-01-26T07:53:17.263Z hawk5| 2017-01-26 07:53:13 =ERROR REPORT==== >> 2017-01-26T07:53:18.198Z hawk5| 2017-01-26 07:53:17.861 [error] emulator >> Error in process <0.2254.73> on node 'r...@hawk5.streethawk.com' with >> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$ >> t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{ >> cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} >> 2017-01-26T07:53:18.208Z hawk5| >> 2017-01-26T07:53:18.208Z hawk5| 2017-01-26 07:53:17.861 [error] emulator >> Error in process <0.2254.73> on node 'r...@hawk5.streethawk.com' with >> exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$ >> t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{ >> cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} >> 2017-01-26T07:53:18.264Z hawk5| >> >> >> Python client traces: >> >> 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/ >> dist-packages/riak/client/transport.py", line 179, in wrapper >> 2017-01-26T10:20:44.517Z hawk5| return >> self._client.fulltext_search(search_index, >> query, **params) >> 2017-01-26T10:20:44.517Z hawk5| File >> "/usr/local/lib/python2.7/dist-packages/riak/bucket.py", >> line 476, in search >> 2017-01-26T10:20:44.517Z hawk5| raise e.args[0] >> 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/ >> dist-packages/riak/client/transport.py", line 134, in _with_retries >> 2017-01-26T10:20:44.517Z hawk5| return self._with_retries(pool, thunk) >> 2017-01-26T10:20:44.543Z hawk5| RiakError: 'recv_into returned zero bytes >> unexpectedly' >> >> >> Regards >> >> Steven Joseph >> >> CTO, StreetHawk Pty Ltd >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com