Hi Mark,
I resized the cluster from 4x1GB RAM to 4x4GB RAM. Also increased
{map_js_vm_count, 8 } to {map_js_vm_count, 48 } and {reduce_js_vm_count,
6 } to {reduce_js_vm_count, 36 } inside app.config but still have the
same problem from time to time...
The function I use to do the link-walking call is below:
/**
* Fetches all user artifacts using link walking and returns them in
the callback
*
* @param id - unique user id
* @param cb
* @return - an array of user artifacts
*/
fetchUserArtifacts = function(id, cb) {
riakwalk(config.CONCRETESBUCKET, id, [
[config.ARTIFACTSBUCKET, 'artifacts', '_']
], function(user_artifacts) {
if(user_artifacts.length) {
// the first element in the result holds all links, so we
will skip it
cb(user_artifacts[1]);
}
else {
cb([]);
}
});
};
As you could see it's a simple query fetching 50-150 small objects. The
cluster is almost idle so it should be able to serve the request. I had
similar problem awhile ago and decided to fetch objects one by one
instead of using link-walking and that patch did the trick. The
performance degraded slightly but at least worked all the time. For the
case mentioned here I just created a non link-walking version that
fetches 150 objects for about a second which is acceptable. Will
investigate further when I have time :-)
Ivaylo
On 12-06-10 05:20 PM, Mark Phillips wrote:
Hi Ivaylo,
Take a look at this thread:
http://riak.markmail.org/search/?q=exit%20with%20reason%20fitting_died%20in%20context%20child_terminated#query:exit%20with%20reason%20fitting_died%20in%20context%20child_terminated+page:1+mid:n4gfl43hcvzthjl7+state:results
I think this is what you're seeing. You should read the entire message
I linked to, but the important thing is that the reason you're seeing
the "fitting_died in context child_terminated" logs is due to a
timeout with a Riak Pipe-based M/R process. To paraphrase Bryan Fink,
those messages are normal and intended to help debug issues. Are you
still seeing them?
I would be interested to know what type of MapReduce load you're
putting on your cluster. "4 machines x 1GB RAM" isn't a very powerful
cluster and MapReduce jobs (especially those written in java script)
can tax Riak nodes significantly. Anything details you can share?
Mark
On Wed, Jun 6, 2012 at 4:38 PM, Ivaylo Panitchkov
<ipanitch...@hibernum.com <mailto:ipanitch...@hibernum.com>> wrote:
Hello everyone,
We started getting the following errors on all servers in the
cluster (4 machines x 1GB RAM, riak_1.0.2-1_amd64.deb):
20:12:36.753 [error] Supervisor riak_pipe_vnode_worker_sup had
child undefined started with
{riak_pipe_vnode_worker,start_link,undefined} at <0.8855.0> exit
with reason fitting_died in context child_terminated
20:12:36.754 [error] Supervisor riak_pipe_vnode_worker_sup had
child undefined started with
{riak_pipe_vnode_worker,start_link,undefined} at <0.8856.0> exit
with reason fitting_died in context child_terminated
20:12:36.965 [error] Supervisor riak_pipe_vnode_worker_sup had
child undefined started with
{riak_pipe_vnode_worker,start_link,undefined} at <0.8860.0> exit
with reason fitting_died in context child_terminated
20:12:36.967 [error] Supervisor riak_pipe_vnode_worker_sup had
child undefined started with
{riak_pipe_vnode_worker,start_link,undefined} at <0.8861.0> exit
with reason fitting_died in context child_terminated
If we restart the riak service on all machines one by one the
error message disappears for a while.
Any ideas to solve the issue will be much appreciated.
Thanks in advance,
Ivaylo
REMARK: Replaced the IP addresses for security sake
*root@riak01:~# riak-admin member_status*
Attempting to restart script through sudo -u riak
================================= Membership
==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 25.0% -- 'riak@IP1'
valid 25.0% -- 'riak@IP2'
valid 25.0% -- 'riak@IP3'
valid 25.0% -- 'riak@IP4'
-------------------------------------------------------------------------------
Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
*root@riak01:~# riak-admin ring_status*
Attempting to restart script through sudo -u riak
================================== Claimant
===================================
Claimant: 'riak@IP1'
Status: up
Ring Ready: true
============================== Ownership Handoff
==============================
No pending changes.
============================== Unreachable Nodes
==============================
All nodes are up and reachable
*root@riak01:~# riak-admin ringready*
Attempting to restart script through sudo -u riak
TRUE All nodes agree on the ring
['riak@IP1','riak@IP2','riak@IP3','riak@IP4']
*root@riak01:~# riak-admin transfers*
Attempting to restart script through sudo -u riak
No transfers active
--
Ivaylo Panitchkov
Software developer
Hibernum Creations Inc.
Ce courriel est confidentiel et peut aussi être protégé par la loi.Si vous avez
reçu ce courriel par erreur, veuillez nous en aviser immédiatement en y
répondant, puis supprimer ce message de votre système. Veuillez ne pas le
copier, l’utiliser pour quelque raison que ce soit ni divulguer son contenu à
quiconque.
This email is confidential and may also be legally privileged. If you have
received this email in error, please notify us immediately by reply email and
then delete this message from your system. Please do not copy it or use it for
any purpose or disclose its content.
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com