Re: riaksearch memory growth issues

Gordon Tillman Tue, 07 Jun 2011 13:53:08 -0700

Thanks David,

If there is anything I can do from this end to help please don't hesitate to 
ask.


--gordon

On Jun 7, 2011, at 15:34 , David Smith wrote:

> Gordon,
> 
> Thanks for the test case. I've queued it up for review by a dev, as
> time permits.
> 
> D.
> 
> On Tue, Jun 7, 2011 at 1:33 PM, Gordon Tillman <gtill...@mezeo.com> wrote:
>> Guys I have put together a simple test to reproduce the error that we are
>> seeing.
>> It is on github here:
>> https://github.com/gordyt/riaksearch-test
>> This is a multi-threaded test that connects to Riak using the protocol
>> buffers interface.  Each iteration of the run loop issues one simple search
>> and uploads one small json object.
>> Thanks very much for any input you might have.
>> Regards,
>> --gordon
>> On Jun 6, 2011, at 10:01 , Gordon Tillman wrote:
>> 
>> Good Morning Gilbert,
>> I have posted this gist:
>> https://gist.github.com/1010384
>> It is a minor update we made to it_op_collector_loop/3 in
>> riak_search_op_utils.  This update was done to alleviate the situation that
>> we observe here:
>> https://gist.github.com/1000735
>> But it was made with the understanding that this is treating a symptom and
>> not fixing the cause of the problem.
>> A little bit of followup information: The problem seems to be exacerbated
>> when Riak is hit with a series of operations that are all generating the
>> same search/map/reduce operation (albeit with differing search input
>> parameters).
>> We installed 0.14.2 and tested this weekend (without our update applied) and
>> observed the same issues.
>> If I found out anything else I will let you know.
>> --gordon
>> 
>> 
>> 
>> On May 31, 2011, at 18:09 , Gilbert Glåns wrote:
>> 
>> Gordon,
>> 
>> Great news!  Much appreciated.
>> 
>> Gilbert
>> 
>> On Tue, May 31, 2011 at 2:25 PM, Gordon Tillman <gtill...@mezeo.com> wrote:
>> 
>> Howdy Gilbert,
>> 
>> Hey we are testing a fix now.  If this works I will send you a copy of the
>> update file.
>> 
>> --gordon
>> 
>> 
>> On May 31, 2011, at 12:55 , Gilbert Glåns wrote:
>> 
>> Hi Gordon,
>> 
>> Thank you for sharing the information.  We are seeing the same exact
>> 
>> type of behavior from our search cluster.  I have tracked the
>> 
>> problem(s) though the query system.  It looks like the mailboxes we
>> 
>> are both seeing are "abandoned" and / or the messages are never
>> 
>> matched within the Erlang code (it_op_collector_loop,
>> 
>> riak_search_op_utils.erl); the messages are then never processed,
>> 
>> therefore the resources they utilize never released.  This is a major
>> 
>> problem.
>> 
>> I have been debugging this for some time and I wish I could say it was
>> 
>> going well.  The implementation is convoluted -- have you gotten
>> 
>> through it?  Can you verify the same cause?
>> 
>> We have been internally discussing the possibility of removing this
>> 
>> query processing implementation completely and replacing it with
>> 
>> something built in-house because the problems we have uncovered trying
>> 
>> to debug the "abandoned mailbox" problem are related and systemic:  1)
>> 
>> indeterminate and possibly very large data structures created and
>> 
>> manipulated for intermediate and final sets of results, 2) very poor
>> 
>> or non-existent ability to gain any insight into what is executing
>> 
>> within the "plumbing" of the current query execution system without
>> 
>> "herculean" effort (in my opinion), and 3) unacceptable performance
>> 
>> (predictably or subjectively) from the merge_index riak_search
>> 
>> backend.
>> 
>> Are there any other backends available for riak_search with the
>> 
>> Enterprise Riak offering?  I really like the design of riak_search but
>> 
>> the performance seems to be only a very small fraction of our
>> 
>> equivalent SOLR installation, even with several times the amount of
>> 
>> resources "thrown at it" -- it does not seem to use resources we
>> 
>> "throw at it" well, either, or in the mailboxes case, responsibly.
>> 
>> I will quickly admit I may be doing something wrong.  Is there a
>> 
>> user-error situation in which mailboxes should be abandoned taking up
>> 
>> memory?
>> 
>> Does anyone else have experiences with equivalent riak_search vs. SOLR
>> 
>> installations?
>> 
>> Thanks again for sharing Gordon.  Your results make me feel like this
>> 
>> may not be entirely stupidity on my part.
>> 
>> Gilbert
>> 
>> 
>> On Tue, May 31, 2011 at 8:51 AM, Gordon Tillman <gtill...@mezeo.com> wrote:
>> 
>> Howdy Gilbert,
>> 
>> I reproduced the issue this morning and then ran the command that you
>> 
>> specified on two of the non-empty mailboxes.
>> 
>> The output from that is posted here:
>> 
>> https://gist.github.com/1000735
>> 
>> Please let me know if this corresponds to the issue that you are seeing.
>> 
>> Thank you,
>> 
>> --gordon
>> 
>> On May 27, 2011, at 20:10 , Gilbert Glåns wrote:
>> 
>> Gordon,
>> 
>> Could you try:
>> 
>> erlang:process_info(list_to_pid("<0.16614.32>"), [messages,
>> 
>> current_function, initial_call, links, memory, status]).
>> 
>> in a riak search console for one/some of those mailboxes and share the
>> 
>> results? I am curious to see if you are having the same systemic
>> 
>> memory consumption I am experiencing.
>> 
>> Gilbert
>> 
>> On Fri, May 27, 2011 at 5:15 PM, Gordon Tillman <gtill...@mezeo.com> wrote:
>> 
>> Howdy Gang,
>> 
>> We are having a bit of an issue with our 3-node riaksearch cluster.  What is
>> 
>> happing is this:
>> 
>> Cluster is up and running.  We start testing our application against it.  As
>> 
>> the application runs the erlang process consumes more and more memory
>> 
>> without ever releasing it.
>> 
>> In trying to investigate the issue we ran the riaksearch-admin cluster_info
>> 
>> command.  It appears that the bulk of this memory is being consumed by a
>> 
>> bunch of mailboxes.
>> 
>> I have posted both the output of the cluster_info command and the app.config
>> 
>> from one of the nodes here:
>> 
>> https://gist.github.com/996419
>> 
>> I would be very grateful if someone from Basho would take a look at the
>> 
>> cluster_info and see if they can spot anything obvious.
>> 
>> Each machine in the cluster has an 8-core Xeon and 16GB RAM.  I believe all
>> 
>> of the platform details, etc., are in the cluster_info dump.
>> 
>> Many thanks,
>> 
>> --gordon
>> 
>> _______________________________________________
>> 
>> riak-users mailing list
>> 
>> riak-users@lists.basho.com
>> 
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
> 
> 
> 
> -- 
> Dave Smith
> Director, Engineering
> Basho Technologies, Inc.
> diz...@basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riaksearch memory growth issues

Reply via email to