Hi Dave, Are you sure that "No queries are running”?
The log you posted shows the index coverage fsm running as well as the streaming merge sort buffer. My guess would be you have some (many?) 2i queries with large page_size set on the results and a slow vnode causing all the results to be buffered in memory. Can you check again that your regular workload doesn’t include a bunch of 2i queries with large result sets? Cheers Russell On 14 Nov 2013, at 21:32, Dave Brady <dbr...@weborama.com> wrote: > Hi Luke, > > Thanks for responding! I've been unavailable most of the day, hence my late > reply. > > I'll gather up the those logs (tomorrow). > > No queries are running, and no one has tried to getting the a key list. We > restarted our programs to clear any connections they had to the slow nodes > after disabling those nodes in haproxy. > > One node has slowly, over the last five hours, started to head back to > normal. Peak usage is down to 5.2 GB. > > The other node has gotten even worse. It's now ranging from 6.5 GB to 23 GB. > > -- > Dave Brady > > From: "Luke Bakken" <lbak...@basho.com> > To: "Dave Brady" <dbr...@weborama.com> > Cc: "riak-users" <riak-users@lists.basho.com> > Sent: Jeudi 14 Novembre 2013 18:14:43 > Subject: Re: Degraded response times with massive increase in Erlang VM > process memory use > > Hi Dave, > > A few people have chimed in to ask what kinds of queries are running / have > been run recently against this cluster - Map/Reduce, list keys, 2i? > > -- > Luke Bakken > CSE > lbak...@basho.com > > > On Thu, Nov 14, 2013 at 1:56 AM, Dave Brady <dbr...@weborama.com> wrote: > Hello Everyone, > > Two of our five nodes seeing the 100% GET/PUT times (node_[get | > put]_fsm_time_100) increase to as high as 8 seconds, and looking at our > available metrics we see huge amounts memory being used by Erlang processes > (memory_processed_used). > > We normally see Erlang processes use tens of MBs, and occasionally a few > hundred MBs for short periods. One node is now using between 5.2 GB to 18.5 > GB. The other one is just little lower: 4 GB to 14 GB. > > Our average object size is roughly 25 KB. > > The logs on these two nodes have lots of: > > 2013-11-14 09:26:03.961 [info] > <0.83.0>@riak_core_sysmon_handler:handle_event:92 monitor long_gc > <0.19362.2932> > [{initial_call,{riak_core_coverage_fsm,init,1}},{almost_current_function,{sms,sms,1}},{message_queue_len,41}] > > [{timeout,5356},{old_heap_block_size,0},{heap_block_size,870001580},{mbuf_size,0},{stack_size,54},{old_heap_size,0},{heap_size,336260414}] > 2013-11-14 09:26:03.961 [info] > <0.83.0>@riak_core_sysmon_handler:handle_event:92 monitor large_heap > <0.19362.2932> > [{initial_call,{riak_core_coverage_fsm,init,1}},{almost_current_function,{sms,sms,1}},{message_queue_len,41}] > > [{old_heap_block_size,0},{heap_block_size,870001580},{mbuf_size,0},{stack_size,54},{old_heap_size,0},{heap_size,336260414}] > 2013-11-14 09:26:03.968 [error] <0.3205.3273> CRASH REPORT Process > <0.3205.3273> with 0 neighbours crashed with reason: no function clause > matching webmachine_request:peer_from_peername({error,enotconn}, > {webmachine_request,{wm_reqstate,#Port<0.38822194>,[],undefined,undefined,undefined,{wm_reqdata,...},...}}) > line 150 > > Anyone seen this before? > -- > Dave Brady > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com