Aurelien's advice is good -- check and see exactly what the debugger is telling you. You might want to look at the corefile in the debugger and see exactly where it failed -- it may or may not be an MPI issue.

Also -- Aurelien didn't directly say it, but don't worry about the OMPI_DECLSPEC stuff. You'll see earlier in mpi.h that OMPI_DECLSPEC is #define'd to be empty (it's for Windows compatibility).

Keep in mind that although different MPI implementations provide source code compatibility for MPI applications, they are not binary- portable.

So if you compile an MPI application with MPICH's wrapper compilers, it will not run properly under Open MPI's mpirun (and vice versa). You must entirely compile your application with Open MPI's wrapper compilers and then run it with Open MPI's mpirun.


On Sep 21, 2008, at 12:35 PM, Aurélien Bouteiller wrote:

Are you sure that you have matching versions of the MPI library and mpi.h file ? Open MPI and MPICH have different internal types for the opaque MPI objects (such as MPI_Comm). If you have mismatching mpi.h and mpi library, you'll transmit those as integer to the library while it is expecting pointers. This will obviously segfault very badly. Please make sure that you actually use the mpi.h from open MPI (by using Open MPI's mpicc) to compile your program when using Open MPI. Also make sure that you don't have another version of libmpi in your LD_LIBRARY_PATH that could be used instead of the one you used to compile.

Aurelien

Le 21 sept. 08 à 04:38, Shafagh Jafer a écrit :


Ok. I noticed that whenever in my code, i use an MPI fucntion that has "OMPI_DECLSPEC" in front of it in mpi.h , I get this segfault error. Could some one please tell me what is "OMPI_DECLSPEC"?? is it a macro that I need to enable ?!? forexample, in MPICH the function getsize in mpi.h looks like the following:

int MPI_Comm_size(MPI_Comm, int *);

but the same function in OMPI apears as follows:
OMPI_DECLSPEC int MPI_Comm_size(MPI_Comm comm, int *size);

--- On Sat, 9/20/08, Shafagh Jafer <barf...@yahoo.com> wrote:
From: Shafagh Jafer <barf...@yahoo.com>
Subject: Re: [OMPI users] Segmentation Fault--libc.so. 6(__libc_start_main...
To: "Open MPI Users" <us...@open-mpi.org>
Date: Saturday, September 20, 2008, 9:50 PM

My code was working perfect when I had it with MPICH now I have replaced that with OMPI. Could that be the problem?? Do I need to change any part of my source code if I migrate from MPICH-1.2.6 to OpenMPI-1.2.7?? Please let me know.

--- On Sat, 9/20/08, Aurélien Bouteiller <boute...@eecs.utk.edu> wrote:
From: Aurélien Bouteiller <boute...@eecs.utk.edu>
Subject: Re: [OMPI users] Segmentation Fault--libc.so. 6(__libc_start_main...
To: "Open MPI Users" <us...@open-mpi.org>
Date: Saturday, September 20, 2008, 6:54 AM

Shafagh,

You have a segfault in your own code. Because Open MPI detects it, it
forwards the error to you and pretty print it but Open MPI is not the
source of the bug. From the stack trace, I suggest you gdb debug the
PhysicalGetID function.

Aurelien

Le 19 sept. 08 à 22:22, Shafagh Jafer a écrit :

> Hi every one,
> I need urgent help plz :-(
> I am getting the following error when i run my program. OpenMPI
> compilation was all fine and went well, but now i dont understand
> the source of this error:
> ============================================
> [node01:29264] *** Process received signal ***
> [node01:29264] Signal: Segmentation fault (11)
> [node01:29264] Signal code: Address not mapped (1)
> [node01:29264] Failing at address: 0xcf
> [node01:29264] [ 0] /lib/tls/libpthread.so.0 [0x7ccf80]
> [node01:29264] [ 1] /nfs/sjafer/phd/openMPI/latest_cd++_timewarp/ cd++
> (physicalGetId__C10CommPhyMPI+0x14) [0x8305880]
> [node01:29264] [ 2] /nfs/sjafer/phd/openMPI/latest_cd++_timewarp/ cd++
> (physicalCommGetId__Fv+0x43) [0x82ff81b]
> [node01:29264] [ 3] /nfs/sjafer/phd/openMPI/latest_cd++_timewarp/ cd++
> (openComm__16StandAloneLoader+0x1f) [0x80fdf43]
> [node01:29264] [ 4] /nfs/sjafer/phd/openMPI/latest_cd++_timewarp/ cd++
> (run__21ParallelMainSimulator+0x1640) [0x81ea53c]
> [node01:29264] [ 5] /nfs/sjafer/phd/openMPI/latest_cd++_timewarp/ cd++
> (main+0xde) [0x80a58ce]
> [node01:29264] [ 6] /lib/tls/libc.so.6(__libc_start_main+0xda)
> [0xe3d79a]
> [node01:29264] [ 7] /nfs/sjafer/phd/openMPI/latest_cd++_timewarp/ cd++
> (sinh+0x4d) [0x80a2221]
> [node01:29264] *** End of error message ***
> mpirun noticed that job rank 0 with PID 29264 on node node01 exited
> on signal 11 (Segmentation fault).
> ===========================================
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems


Reply via email to