Re: [error] Supervisor riak_pipe_vnode_worker_sup had child undefined started with ...

Ivaylo Panitchkov Wed, 13 Jun 2012 12:04:08 -0700


Hi Mark,

I resized the cluster from 4x1GB RAM to 4x4GB RAM. Also increased{map_js_vm_count, 8 } to {map_js_vm_count, 48 } and {reduce_js_vm_count,6 } to {reduce_js_vm_count, 36 } inside app.config but still have thesame problem from time to time...


The function I use to do the link-walking call is below:

/**

* Fetches all user artifacts using link walking and returns them inthe callback

 *
 * @param id - unique user id
 * @param cb
 * @return - an array of user artifacts
 */
fetchUserArtifacts = function(id, cb) {
    riakwalk(config.CONCRETESBUCKET, id, [
        [config.ARTIFACTSBUCKET, 'artifacts', '_']
    ], function(user_artifacts) {
        if(user_artifacts.length) {

// the first element in the result holds all links, so wewill skip it

            cb(user_artifacts[1]);
        }
        else {
            cb([]);
        }
    });
};

As you could see it's a simple query fetching 50-150 small objects. Thecluster is almost idle so it should be able to serve the request. I hadsimilar problem awhile ago and decided to fetch objects one by oneinstead of using link-walking and that patch did the trick. Theperformance degraded slightly but at least worked all the time. For thecase mentioned here I just created a non link-walking version thatfetches 150 objects for about a second which is acceptable. Willinvestigate further when I have time :-)


Ivaylo



On 12-06-10 05:20 PM, Mark Phillips wrote:

Hi Ivaylo,

Take a look at this thread:

http://riak.markmail.org/search/?q=exit%20with%20reason%20fitting_died%20in%20context%20child_terminated#query:exit%20with%20reason%20fitting_died%20in%20context%20child_terminated+page:1+mid:n4gfl43hcvzthjl7+state:results

I think this is what you're seeing. You should read the entire messageI linked to, but the important thing is that the reason you're seeingthe "fitting_died in context child_terminated" logs is due to atimeout with a Riak Pipe-based M/R process. To paraphrase Bryan Fink,those messages are normal and intended to help debug issues. Are youstill seeing them?

I would be interested to know what type of MapReduce load you'reputting on your cluster. "4 machines x 1GB RAM" isn't a very powerfulcluster and MapReduce jobs (especially those written in java script)can tax Riak nodes significantly. Anything details you can share?


Mark

On Wed, Jun 6, 2012 at 4:38 PM, Ivaylo Panitchkov<ipanitch...@hibernum.com <mailto:ipanitch...@hibernum.com>> wrote:



    Hello everyone,

    We started getting the following errors on all servers in the
    cluster (4 machines x 1GB RAM, riak_1.0.2-1_amd64.deb):

    20:12:36.753 [error] Supervisor riak_pipe_vnode_worker_sup had
    child undefined started with
    {riak_pipe_vnode_worker,start_link,undefined} at <0.8855.0> exit
    with reason fitting_died in context child_terminated
    20:12:36.754 [error] Supervisor riak_pipe_vnode_worker_sup had
    child undefined started with
    {riak_pipe_vnode_worker,start_link,undefined} at <0.8856.0> exit
    with reason fitting_died in context child_terminated
    20:12:36.965 [error] Supervisor riak_pipe_vnode_worker_sup had
    child undefined started with
    {riak_pipe_vnode_worker,start_link,undefined} at <0.8860.0> exit
    with reason fitting_died in context child_terminated
    20:12:36.967 [error] Supervisor riak_pipe_vnode_worker_sup had
    child undefined started with
    {riak_pipe_vnode_worker,start_link,undefined} at <0.8861.0> exit
    with reason fitting_died in context child_terminated


    If we restart the riak service on all machines one by one the
    error message disappears for a while.
    Any ideas to solve the issue will be much appreciated.

    Thanks in advance,
    Ivaylo

    REMARK: Replaced the IP addresses for security sake

    *root@riak01:~# riak-admin member_status*
    Attempting to restart script through sudo -u riak
    ================================= Membership
    ==================================
    Status     Ring    Pending    Node
    
-------------------------------------------------------------------------------
    valid      25.0%      --      'riak@IP1'
    valid      25.0%      --      'riak@IP2'
    valid      25.0%      --      'riak@IP3'
    valid      25.0%      --      'riak@IP4'
    
-------------------------------------------------------------------------------
    Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

    *root@riak01:~# riak-admin ring_status*
    Attempting to restart script through sudo -u riak
    ================================== Claimant
    ===================================
    Claimant:  'riak@IP1'
    Status:     up
    Ring Ready: true

    ============================== Ownership Handoff
    ==============================
    No pending changes.

    ============================== Unreachable Nodes
    ==============================
    All nodes are up and reachable

    *root@riak01:~# riak-admin ringready*
    Attempting to restart script through sudo -u riak
    TRUE All nodes agree on the ring
    ['riak@IP1','riak@IP2','riak@IP3','riak@IP4']

    *root@riak01:~# riak-admin transfers*
    Attempting to restart script through sudo -u riak
    No transfers active


--
Ivaylo Panitchkov
Software developer
Hibernum Creations Inc.

Ce courriel est confidentiel et peut aussi être protégé par la loi.Si vous avez 
reçu ce courriel par erreur, veuillez nous en aviser immédiatement en y 
répondant, puis supprimer ce message de votre système. Veuillez ne pas le 
copier, l’utiliser pour quelque raison que ce soit ni divulguer son contenu à 
quiconque.
This email is confidential and may also be legally privileged. If you have 
received this email in error, please notify us immediately by reply email and 
then delete this message from your system. Please do not copy it or use it for 
any purpose or disclose its content.

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: [error] Supervisor riak_pipe_vnode_worker_sup had child undefined started with ...

Reply via email to