Hi Mark,

I resized the cluster from 4x1GB RAM to 4x4GB RAM. Also increased {map_js_vm_count, 8 } to {map_js_vm_count, 48 } and {reduce_js_vm_count, 6 } to {reduce_js_vm_count, 36 } inside app.config but still have the same problem from time to time...

The function I use to do the link-walking call is below:

/**
* Fetches all user artifacts using link walking and returns them in the callback
 *
 * @param id - unique user id
 * @param cb
 * @return - an array of user artifacts
 */
fetchUserArtifacts = function(id, cb) {
    riakwalk(config.CONCRETESBUCKET, id, [
        [config.ARTIFACTSBUCKET, 'artifacts', '_']
    ], function(user_artifacts) {
        if(user_artifacts.length) {
// the first element in the result holds all links, so we will skip it
            cb(user_artifacts[1]);
        }
        else {
            cb([]);
        }
    });
};

As you could see it's a simple query fetching 50-150 small objects. The cluster is almost idle so it should be able to serve the request. I had similar problem awhile ago and decided to fetch objects one by one instead of using link-walking and that patch did the trick. The performance degraded slightly but at least worked all the time. For the case mentioned here I just created a non link-walking version that fetches 150 objects for about a second which is acceptable. Will investigate further when I have time :-)

Ivaylo



On 12-06-10 05:20 PM, Mark Phillips wrote:
Hi Ivaylo,

Take a look at this thread:

http://riak.markmail.org/search/?q=exit%20with%20reason%20fitting_died%20in%20context%20child_terminated#query:exit%20with%20reason%20fitting_died%20in%20context%20child_terminated+page:1+mid:n4gfl43hcvzthjl7+state:results

I think this is what you're seeing. You should read the entire message I linked to, but the important thing is that the reason you're seeing the "fitting_died in context child_terminated" logs is due to a timeout with a Riak Pipe-based M/R process. To paraphrase Bryan Fink, those messages are normal and intended to help debug issues. Are you still seeing them?

I would be interested to know what type of MapReduce load you're putting on your cluster. "4 machines x 1GB RAM" isn't a very powerful cluster and MapReduce jobs (especially those written in java script) can tax Riak nodes significantly. Anything details you can share?

Mark



On Wed, Jun 6, 2012 at 4:38 PM, Ivaylo Panitchkov <ipanitch...@hibernum.com <mailto:ipanitch...@hibernum.com>> wrote:


    Hello everyone,

    We started getting the following errors on all servers in the
    cluster (4 machines x 1GB RAM, riak_1.0.2-1_amd64.deb):

    20:12:36.753 [error] Supervisor riak_pipe_vnode_worker_sup had
    child undefined started with
    {riak_pipe_vnode_worker,start_link,undefined} at <0.8855.0> exit
    with reason fitting_died in context child_terminated
    20:12:36.754 [error] Supervisor riak_pipe_vnode_worker_sup had
    child undefined started with
    {riak_pipe_vnode_worker,start_link,undefined} at <0.8856.0> exit
    with reason fitting_died in context child_terminated
    20:12:36.965 [error] Supervisor riak_pipe_vnode_worker_sup had
    child undefined started with
    {riak_pipe_vnode_worker,start_link,undefined} at <0.8860.0> exit
    with reason fitting_died in context child_terminated
    20:12:36.967 [error] Supervisor riak_pipe_vnode_worker_sup had
    child undefined started with
    {riak_pipe_vnode_worker,start_link,undefined} at <0.8861.0> exit
    with reason fitting_died in context child_terminated


    If we restart the riak service on all machines one by one the
    error message disappears for a while.
    Any ideas to solve the issue will be much appreciated.

    Thanks in advance,
    Ivaylo

    REMARK: Replaced the IP addresses for security sake

    *root@riak01:~# riak-admin member_status*
    Attempting to restart script through sudo -u riak
    ================================= Membership
    ==================================
    Status     Ring    Pending    Node
    
-------------------------------------------------------------------------------
    valid      25.0%      --      'riak@IP1'
    valid      25.0%      --      'riak@IP2'
    valid      25.0%      --      'riak@IP3'
    valid      25.0%      --      'riak@IP4'
    
-------------------------------------------------------------------------------
    Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

    *root@riak01:~# riak-admin ring_status*
    Attempting to restart script through sudo -u riak
    ================================== Claimant
    ===================================
    Claimant:  'riak@IP1'
    Status:     up
    Ring Ready: true

    ============================== Ownership Handoff
    ==============================
    No pending changes.

    ============================== Unreachable Nodes
    ==============================
    All nodes are up and reachable

    *root@riak01:~# riak-admin ringready*
    Attempting to restart script through sudo -u riak
    TRUE All nodes agree on the ring
    ['riak@IP1','riak@IP2','riak@IP3','riak@IP4']

    *root@riak01:~# riak-admin transfers*
    Attempting to restart script through sudo -u riak
    No transfers active


--
Ivaylo Panitchkov
Software developer
Hibernum Creations Inc.

Ce courriel est confidentiel et peut aussi être protégé par la loi.Si vous avez 
reçu ce courriel par erreur, veuillez nous en aviser immédiatement en y 
répondant, puis supprimer ce message de votre système. Veuillez ne pas le 
copier, l’utiliser pour quelque raison que ce soit ni divulguer son contenu à 
quiconque.
This email is confidential and may also be legally privileged. If you have 
received this email in error, please notify us immediately by reply email and 
then delete this message from your system. Please do not copy it or use it for 
any purpose or disclose its content.

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to