@Terry

I hope this is of any help (debugged with TotalView):

Enclose you will find a graph from TotalView as well as this:
/Created process 2 (7633), named "mpirun"
Thread 2.1 has appeared
Thread 2.2 has appeared
Thread 2.1 received a signal (Segmentation Violation)/

and the stack trace:
/     _mca_pls_xgrid_set_node_name,              FP=bffff090
    -[PlsXGridClient launchJob:],              FP=bffff100
    _orte_pls_xgrid_launch,                    FP=bffff240
    _orte_rmgr_urm_spawn,                      FP=bffff290
orterun,                                   FP=bffff310
main,                                      FP=bffff3b0
_start,                                    FP=bffff400/

and this (bold crashed):
/     0x00257680:         0x805e0044  lwz       rtoc,68(r30)
    0x00257684:         0x38000001  li        r0,1
    *0x00257688:         0x90020010  stw       r0,16(rtoc)*
    0x0025768c:         0x805e0044  lwz       rtoc,68(r30)
    0x00257690:         0x38008000  li        r0,-32768/

from function /_mca_pls_xgrid_set_node_name/ in /mca_pls_xgrid.so/

Unfortunately I'm not yet familiar with TotalView, so let me know if you like to get more output (sorry: haven't found dbx for Mac OS X -> that's why TotalView was used)

Yours,
Frank

users-requ...@open-mpi.org wrote:

------------------------------

Message: 2
List-Post: users@lists.open-mpi.org
Date: Wed, 28 Jun 2006 10:35:03 -0400
From: "Terry D. Dontje" <terry.don...@sun.com>
Subject: [OMPI users] Re : OpenMPI 1.1: Signal:10,
        info.si_errno:0(Unknown, error: 0), si_code:1(BUS_ADRALN)
To: us...@open-mpi.org
Message-ID: <44a29397.2000...@sun.com>
Content-Type: text/plain; format=flowed; charset=ISO-8859-1

Frank,

Can you set your limit coredumpsize to non-zero rerun the program
and then get the stack via dbx?

So, I have a similar case of BUS_ADRALN on SPARC systems with an
older version (June 21st) of the trunk. I've since run using the latest trunk and the bus went away. I am now going to try this out with v1.1 to see if I get similar results. Your stack would help me try and determine if this is an OpenMPI issue
or possibly some type of platform problem.

There is another thread with Eric Thibodeau that I am unsure if it is the same issue as either of our situation.
--td


>
>Message: 3
>Date: Wed, 28 Jun 2006 14:30:12 +0200
>From: openmpi-user <openmpi-u...@fraka-mp.de>
>Subject: Re: [OMPI users] OpenMPI 1.1: Signal:10
>    info.si_errno:0(Unknown, error: 0), si_code:1(BUS_ADRALN) (Terry D.
>    Dontje)
>To: us...@open-mpi.org
>Message-ID: <44a27654.9060...@fraka-mp.de>
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hi Terry,
>
>unfortunately I haven't got a stack trace.
>
>OS: Mac OS X 10.4.7 Server on the Xgrid-server and Mac OS X 10.4.7 >Client on every node (G4 and G5). For testing-purposes I've installed >OpenMPI 1.1 on a Dual-G4-node and on a Dual-G5-node with my Xgrid >consisting of only either the Dual-G4- or the Dual-G5-node. No matter >which configuration, I ran into the bus error.
>
>Yours,
>Frank
>
>
> >



------------------------------

Attachment: _mca_pls_xgrid_set_node_name.dot
Description: Binary data

Reply via email to