Hi Steven, Based on that log output, it looks like you're running into issues with system limits, probably open file limits. You can check the value that Riak has available by connecting to one of the nodes with riak attach, then executing:
``` os:cmd("ulimit -n"). ``` (After, disconnect with ctrl+g, then q, then Enter). It should be at least 65,536 ideally, although the bigger the better. If you find it's lower, then follow this doc to increase it. http://docs.basho.com/riak/kv/2.0.2/using/performance/open-files-limit/ Have a check and let us know what the output was. Kind Regards, Shaun On Thu, Jan 26, 2017 at 10:34 AM, Steven Joseph <ste...@streethawk.com> wrote: > Hi, > > We have a cluster of 5 nodes, which are continuously being queried for > new data through solr. We have been having some issues with riak/solr > which seems to be happening after longer periods of operation. It starts > off with one node and it seems to be happening on all node after a > while. > > We tried upgrading to the latest version of riak hoping that it would > solve the issue, but no luck. > > Only thing that stops the crashes is a full cluster staggered restart. > > Please find the logs below. Any help would be much appreciated. > > Riak Logs: > > 2017-01-26T07:53:03.262Z hawk5| ** Last message in was tick > 2017-01-26T07:53:10.197Z hawk5| > 2017-01-26T07:53:10.197Z hawk5| 2017-01-26 07:53:08.183 [error] emulator > Error in process <0.22701.73> on node 'r...@hawk5.streethawk.com' with > exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g > et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223} > ]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} > 2017-01-26T07:53:10.263Z hawk5| Error in process <0.22701.73> on node ' > r...@hawk5.streethawk.com' with exit value: {{badmatch,{error,system_ > limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e > rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{ > file,"cpu_sup.erl"},{line,585}]}]} > 2017-01-26T07:53:10.263Z hawk5| 2017-01-26 07:53:08 =ERROR REPORT==== > 2017-01-26T07:53:17.198Z hawk5| > 2017-01-26T07:53:17.208Z hawk5| 2017-01-26 07:53:13.472 [error] emulator > Error in process <0.12549.73> on node 'r...@hawk5.streethawk.com' with > exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g > et_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223} > ]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} > 2017-01-26T07:53:17.263Z hawk5| Error in process <0.12549.73> on node ' > r...@hawk5.streethawk.com' with exit value: {{badmatch,{error,system_ > limit}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.e > rl"},{line,223}]},{cpu_sup,measurement_server_loop,1,[{ > file,"cpu_sup.erl"},{line,585}]}]} > 2017-01-26T07:53:17.263Z hawk5| 2017-01-26 07:53:13 =ERROR REPORT==== > 2017-01-26T07:53:18.198Z hawk5| 2017-01-26 07:53:17.861 [error] emulator > Error in process <0.2254.73> on node 'r...@hawk5.streethawk.com' with > exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$ > t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{ > cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} > 2017-01-26T07:53:18.208Z hawk5| > 2017-01-26T07:53:18.208Z hawk5| 2017-01-26 07:53:17.861 [error] emulator > Error in process <0.2254.73> on node 'r...@hawk5.streethawk.com' with > exit value: {{badmatch,{error,system_limit}},[{cpu_sup,g$ > t_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,223}]},{ > cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} > 2017-01-26T07:53:18.264Z hawk5| > > > Python client traces: > > 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/ > dist-packages/riak/client/transport.py", line 179, in wrapper > 2017-01-26T10:20:44.517Z hawk5| return > self._client.fulltext_search(search_index, > query, **params) > 2017-01-26T10:20:44.517Z hawk5| File > "/usr/local/lib/python2.7/dist-packages/riak/bucket.py", > line 476, in search > 2017-01-26T10:20:44.517Z hawk5| raise e.args[0] > 2017-01-26T10:20:44.517Z hawk5| File "/usr/local/lib/python2.7/ > dist-packages/riak/client/transport.py", line 134, in _with_retries > 2017-01-26T10:20:44.517Z hawk5| return self._with_retries(pool, thunk) > 2017-01-26T10:20:44.543Z hawk5| RiakError: 'recv_into returned zero bytes > unexpectedly' > > > Regards > > Steven Joseph > > CTO, StreetHawk Pty Ltd > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com