Re: [OMPI users] Ompi failing on mx only

2007-01-09 Thread Grobe, Gary L. (JSC-EV)[ESCG]
> I need it's the backtrace on the process which generate the > segfault. Second, in order to understand the backtrace, it's > better to have run debug version of Open MPI. Without the > debug version we only see the address where the fault occur > without having access to the line number ...

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread George Bosilca
On Jan 8, 2007, at 9:34 PM, Reese Faucette wrote: Right, that's the maximum number of open MX channels, i.e. processes than can run on the node using MX. With MX (1.2.0c I think), I get weird messages if I run a second mpirun quickly after the first one failed. The myrinet guys, I quite sure, c

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Reese Faucette
Right, that's the maximum number of open MX channels, i.e. processes than can run on the node using MX. With MX (1.2.0c I think), I get weird messages if I run a second mpirun quickly after the first one failed. The myrinet guys, I quite sure, can explain why and how. Somehow, when an application

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread George Bosilca
On Jan 8, 2007, at 9:11 PM, Reese Faucette wrote: Second thing. From one of your previous emails, I see that MX is configured with 4 instance by node. Your running with exactly 4 processes on the first 2 nodes. Weirds things might happens ... 4 processes per node will be just fine. This is n

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Reese Faucette
Second thing. From one of your previous emails, I see that MX is configured with 4 instance by node. Your running with exactly 4 processes on the first 2 nodes. Weirds things might happens ... 4 processes per node will be just fine. This is not like GM where the 4 includes some "reserved" port

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread George Bosilca
Not really. This is the backtrace of the process that get killed because mpirun detect that the other one died ... What I need it's the backtrace on the process which generate the segfault. Second, in order to understand the backtrace, it's better to have run debug version of Open MPI. Without

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Grobe, Gary L. (JSC-EV)[ESCG]
> >> PS: Is there any way you can attach to the processes with gdb ? I > >> would like to see the backtrace as showed by gdb in order > to be able > >> to figure out what's wrong there. > > I found out that all processes on the 2nd node crash so I just put a 30 second wait before MPI_Init in or

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Adrian Knoth
On Mon, Jan 08, 2007 at 03:07:57PM -0500, Jeff Squyres wrote: > if you're running in an ssh environment, you generally have 2 choices to > attach serial debuggers: > > 1. Put a loop in your app that pauses until you can attach a > debugger. Perhaps something like this: > > { int i = 0; prin

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Grobe, Gary L. (JSC-EV)[ESCG]
> >> PS: Is there any way you can attach to the processes with gdb ? I > >> would like to see the backtrace as showed by gdb in order > to be able > >> to figure out what's wrong there. > > > > When I can get more detailed dbg, I'll send. Though I'm not > clear on > > what executable is being

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Jeff Squyres
On Jan 8, 2007, at 2:52 PM, Grobe, Gary L. ((JSC-EV))[ESCG] wrote: I was wondering if someone could send me the HACKING file so I can do a bit more with debugging on the snapshots. Our web proxy has webdav methods turned off (request methods fail) so that I can't get to the latest of the svn r

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Grobe, Gary L. (JSC-EV)[ESCG]
I was wondering if someone could send me the HACKING file so I can do a bit more with debugging on the snapshots. Our web proxy has webdav methods turned off (request methods fail) so that I can't get to the latest of the svn repos. > Second thing. From one of your previous emails, I see that MX

Re: [OMPI users] Ompi failing on mx only

2007-01-05 Thread Grobe, Gary L. (JSC-EV)[ESCG]
: Re: [OMPI users] Ompi failing on mx only This is just an FYI of the Jan 5th snapshot. I'll send a backtrace of the processes as soon as I get a b3 running. Between my filtered webdav svn access problems and the latest nightly snapshots, my builds are currently failing where the same config

Re: [OMPI users] Ompi failing on mx only

2007-01-05 Thread Grobe, Gary L. (JSC-EV)[ESCG]
This is just an FYI of the Jan 5th snapshot. I'll send a backtrace of the processes as soon as I get a b3 running. Between my filtered webdav svn access problems and the latest nightly snapshots, my builds are currently failing where the same config lines worked on previous snapshots ... $./confi

Re: [OMPI users] Ompi failing on mx only

2007-01-04 Thread George Bosilca
- mpi.org] On Behalf Of Brian W. Barrett Sent: Tuesday, January 02, 2007 4:11 PM To: Open MPI Users Subject: Re: [OMPI users] Ompi failing on mx only Sorry to jump into the discussion late. The mx btl does not support communication between processes on the same node by itself, so you have

Re: [OMPI users] Ompi failing on mx only

2007-01-04 Thread Jeff Squyres
9 additional processes aborted (not shown) -Original Message- From: users-boun...@open-mpi.org [mailto:users-bounces@open- mpi.org] On Behalf Of Brian W. Barrett Sent: Tuesday, January 02, 2007 4:11 PM To: Open MPI Users Subject: Re: [OMPI users] Ompi failing on mx only Sorry to jump int

Re: [OMPI users] Ompi failing on mx only

2007-01-04 Thread Grobe, Gary L. (JSC-EV)[ESCG]
.@open-mpi.org] On Behalf Of Brian W. Barrett Sent: Tuesday, January 02, 2007 4:11 PM To: Open MPI Users Subject: Re: [OMPI users] Ompi failing on mx only Sorry to jump into the discussion late. The mx btl does not support communication between processes on the same node by itself, so you have to

Re: [OMPI users] Ompi failing on mx only

2007-01-03 Thread Reese Faucette
$ mpirun --prefix /usr/local/openmpi-1.2b2 --hostfile ./h1-3 -np 1 --mca btl mx,sm,self ./cpi [node-1:09704] mca: base: component_find: unable to open mtl mx: file not found (ignored) [node-1:09704] mca: base: component_find: unable to open btl mx: file not found (ignored) This in particular is

Re: [OMPI users] Ompi failing on mx only

2007-01-03 Thread Grobe, Gary L. (JSC-EV)[ESCG]
3.1415926544231341, Error is 0.08333410 wall clock time = 0.000331 -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Brian W. Barrett Sent: Tuesday, January 02, 2007 4:11 PM To: Open MPI Users Subject: Re: [OMPI users] Ompi failing on mx only

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Grobe, Gary L. (JSC-EV)[ESCG]
ry 02, 2007 4:08 PM To: Open MPI Users Subject: Re: [OMPI users] Ompi failing on mx only Ompi failing on mx only> I've attached the ompi_info from node-1 and node-2. thanks, but i need "mx_info", not "ompi_info" ;-) > But now that you mention mapper, I take it

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Grobe, Gary L. (JSC-EV)[ESCG]
] On Behalf Of Reese Faucette Sent: Tuesday, January 02, 2007 4:08 PM To: Open MPI Users Subject: Re: [OMPI users] Ompi failing on mx only Ompi failing on mx only> I've attached the ompi_info from node-1 and node-2. thanks, but i need "mx_info", not "ompi_info" ;-) >

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Reese Faucette
As for the MTL, there is a bug in the MX MTL for v1.2 that has been fixed, but after 1.2b2 ... oops, i was stupidly assuming he already had that fix. yes, this is an important fix... -reese

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Brian W. Barrett
users-boun...@open-mpi.org [mailto:users-bounces@open- mpi.org] On Behalf Of Reese Faucette Sent: Tuesday, January 02, 2007 2:52 PM To: Open MPI Users Subject: Re: [OMPI users] Ompi failing on mx only Hi, Gary- This looks like a config problem, and not a code problem yet. Could you send the

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Reese Faucette
Ompi failing on mx only> I've attached the ompi_info from node-1 and node-2. thanks, but i need "mx_info", not "ompi_info" ;-) But now that you mention mapper, I take it that's what SEGV_MAPERR might be referring to. this is an ompi red herring; it has nothing to do with Myrinet mapping, even

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Grobe, Gary L. (JSC-EV)[ESCG]
3 un-ACKed alerts Mapping is complete, last map generated by node-20 Database generation not yet complete. From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Reese Faucette Sent: Tuesday, January 02, 2007 2:52 PM To: Open MPI Use

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Reese Faucette
Ompi failing on mx onlyHi, Gary- This looks like a config problem, and not a code problem yet. Could you send the output of mx_info from node-1 and from node-2? Also, forgive me counter-asking a possibly dumb OMPI question, but is "-x LD_LIBRARY_PATH" really what you want, as opposed to "-x LD

[OMPI users] Ompi failing on mx only

2007-01-02 Thread Grobe, Gary L. (JSC-EV)[ESCG]
I was initially using 1.1.2 and moved to 1.2b2 because of a hang on MPI_Bcast() which 1.2b2 reports to fix, and seemed to have done so. My compute nodes are 2 dual core xeons on myrinet with mx. The problem is trying to get ompi running on mx only. My machine file is as follows ... node-1 slots=4