Guys I have put together a simple test to reproduce the error that we are 
seeing.

It is on github here:

https://github.com/gordyt/riaksearch-test

This is a multi-threaded test that connects to Riak using the protocol buffers 
interface.  Each iteration of the run loop issues one simple search and uploads 
one small json object.

Thanks very much for any input you might have.

Regards,

--gordon

On Jun 6, 2011, at 10:01 , Gordon Tillman wrote:

Good Morning Gilbert,

I have posted this gist:

https://gist.github.com/1010384

<https://gist.github.com/1010384>It is a minor update we made to 
it_op_collector_loop/3 in riak_search_op_utils.  This update was done to 
alleviate the situation that we observe here:

https://gist.github.com/1000735

<https://gist.github.com/1000735>But it was made with the understanding that 
this is treating a symptom and not fixing the cause of the problem.

A little bit of followup information: The problem seems to be exacerbated when 
Riak is hit with a series of operations that are all generating the same 
search/map/reduce operation (albeit with differing search input parameters).

We installed 0.14.2 and tested this weekend (without our update applied) and 
observed the same issues.

If I found out anything else I will let you know.

--gordon



On May 31, 2011, at 18:09 , Gilbert Glåns wrote:

Gordon,

Great news!  Much appreciated.

Gilbert

On Tue, May 31, 2011 at 2:25 PM, Gordon Tillman 
<gtill...@mezeo.com<mailto:gtill...@mezeo.com>> wrote:
Howdy Gilbert,

Hey we are testing a fix now.  If this works I will send you a copy of the 
update file.

--gordon


On May 31, 2011, at 12:55 , Gilbert Glåns wrote:

Hi Gordon,
Thank you for sharing the information.  We are seeing the same exact
type of behavior from our search cluster.  I have tracked the
problem(s) though the query system.  It looks like the mailboxes we
are both seeing are "abandoned" and / or the messages are never
matched within the Erlang code (it_op_collector_loop,
riak_search_op_utils.erl); the messages are then never processed,
therefore the resources they utilize never released.  This is a major
problem.

I have been debugging this for some time and I wish I could say it was
going well.  The implementation is convoluted -- have you gotten
through it?  Can you verify the same cause?

We have been internally discussing the possibility of removing this
query processing implementation completely and replacing it with
something built in-house because the problems we have uncovered trying
to debug the "abandoned mailbox" problem are related and systemic:  1)
indeterminate and possibly very large data structures created and
manipulated for intermediate and final sets of results, 2) very poor
or non-existent ability to gain any insight into what is executing
within the "plumbing" of the current query execution system without
"herculean" effort (in my opinion), and 3) unacceptable performance
(predictably or subjectively) from the merge_index riak_search
backend.

Are there any other backends available for riak_search with the
Enterprise Riak offering?  I really like the design of riak_search but
the performance seems to be only a very small fraction of our
equivalent SOLR installation, even with several times the amount of
resources "thrown at it" -- it does not seem to use resources we
"throw at it" well, either, or in the mailboxes case, responsibly.

I will quickly admit I may be doing something wrong.  Is there a
user-error situation in which mailboxes should be abandoned taking up
memory?

Does anyone else have experiences with equivalent riak_search vs. SOLR
installations?

Thanks again for sharing Gordon.  Your results make me feel like this
may not be entirely stupidity on my part.

Gilbert


On Tue, May 31, 2011 at 8:51 AM, Gordon Tillman 
<gtill...@mezeo.com<mailto:gtill...@mezeo.com>> wrote:
Howdy Gilbert,
I reproduced the issue this morning and then ran the command that you
specified on two of the non-empty mailboxes.
The output from that is posted here:
https://gist.github.com/1000735
Please let me know if this corresponds to the issue that you are seeing.
Thank you,
--gordon

On May 27, 2011, at 20:10 , Gilbert Glåns wrote:

Gordon,
Could you try:

erlang:process_info(list_to_pid("<0.16614.32>"), [messages,
current_function, initial_call, links, memory, status]).

in a riak search console for one/some of those mailboxes and share the
results? I am curious to see if you are having the same systemic
memory consumption I am experiencing.

Gilbert

On Fri, May 27, 2011 at 5:15 PM, Gordon Tillman 
<gtill...@mezeo.com<mailto:gtill...@mezeo.com>> wrote:

Howdy Gang,

We are having a bit of an issue with our 3-node riaksearch cluster.  What is
happing is this:

Cluster is up and running.  We start testing our application against it.  As
the application runs the erlang process consumes more and more memory
without ever releasing it.

In trying to investigate the issue we ran the riaksearch-admin cluster_info
command.  It appears that the bulk of this memory is being consumed by a
bunch of mailboxes.

I have posted both the output of the cluster_info command and the app.config
from one of the nodes here:

https://gist.github.com/996419

I would be very grateful if someone from Basho would take a look at the
cluster_info and see if they can spot anything obvious.

Each machine in the cluster has an 8-core Xeon and 16GB RAM.  I believe all
of the platform details, etc., are in the cluster_info dump.

Many thanks,

--gordon

_______________________________________________

riak-users mailing list

riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>

http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





_______________________________________________
riak-users mailing list
riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to