[OMPI users] Using openmpi within python and crashes

John R. Cary Thu, 9 Jul 2009 13:53:22 -0400

Our scenario is that we are running python, then importing a modulewritten in Fortran.

We run via:

mpiexec -n 8 -x PYTHONPATH -x SIDL_DLL_PATH python tokHsmNP8.py


where the script calls into Fortran to call MPI_Init.

On 8 procs (but not one) we get hangs in the code (on some machines butnot others!).Hard to tell precisely where, because it is in a PETSc method.


Running with valgrind

mpiexec -n 8 -x PYTHONPATH -x SIDL_DLL_PATH valgrind python tokHsmNP8.py

gives a crash, with some salient output:

==936==

==936== Syscall param sched_setaffinity(mask) points to unaddressablebyte(s)

==936==    at 0x39336DAA79: syscall (in /lib64/libc-2.10.1.so)

==936== by 0x10BCBD58: opal_paffinity_linux_plpa_api_probe_init (in/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)==936== by 0x10BCE054: opal_paffinity_linux_plpa_init (in/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)==936== by 0x10BCC9F9:opal_paffinity_linux_plpa_have_topology_information (in/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)==936== by 0x10BCBBFF: linux_module_init (in/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)==936== by 0x10BC99C3: opal_paffinity_base_select (in/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)==936== by 0x10B9DB83: opal_init (in/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)==936== by 0x10920C6C: orte_init (in/usr/local/openmpi-1.3.2-notorque/lib/libopen-rte.so.0.0.0)==936== by 0x10579D06: ompi_mpi_init (in/usr/local/openmpi-1.3.2-notorque/lib/libmpi.so.0.0.0)==936== by 0x10599175: PMPI_Init (in/usr/local/openmpi-1.3.2-notorque/lib/libmpi.so.0.0.0)==936== by 0x10E2BDF4: mpi_init (in/usr/local/openmpi-1.3.2-notorque/lib/libmpi_f77.so.0.0.0)==936== by 0xDF30A1F: uedge_mpiinit_ (in/home/research/cary/projects/facetsall-iter/physics/uedge/par/build/uedgeC.so)

==936==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

This makes me think that our call to mpi_init is wrong.  At

 http://www.mcs.anl.gov/research/projects/mpi/www/www3/MPI_Init.html

it says

Because the Fortran and C versions of MPI_Init<http://www.mpi-forum.org/docs/mpi-11-html/node151.html#node151> aredifferent, there is a restriction on who can call MPI_Init<http://www.mpi-forum.org/docs/mpi-11-html/node151.html#node151>. Theversion (Fortran or C) must match the mainprogram. That is, if the main program is in C, then the C version ofMPI_Init<http://www.mpi-forum.org/docs/mpi-11-html/node151.html#node151> must becalled. If the main program is in Fortran, the Fortran version must becalled.

Should I infer from this that since python is a C code, one must callthe C version of MPI_Init (with argc, argv)?

Or since the module is written mostly in Fortran with mpi calls of onlythe Fortran variety, I can initialize

with the Fortran MPI_Init?

Thanks.....John Cary

[OMPI users] Using openmpi within python and crashes

Reply via email to