Luca,

your email mentions openmpi 1.6.5
but gdb output points to openmpi 1.8.1.

could the root cause be a mix of versions that does not occur with root
account ?

which openmpi version are you expecting ?

you can run
pmap <pid>
when your binary is running and/or under gdb to confirm the openmpi library
that is really used

Cheers,

Gilles

On Wed, Dec 10, 2014 at 7:21 PM, Luca Fini <lf...@arcetri.astro.it> wrote:

> I've a problem running a well tested MPI based application.
>
> The program has been used for years with no problems. Suddenly the
> executable which was run many times with no problems crashed with
> SIGSEGV. The very same executable if run with root privileges works
> OK. The same happens with other executables and across various
> recompilation attempts.
>
> We could not find any relevant difference in the O.S. since a few days
> ago when the program worked also under unprivileged user ID. Actually
> about in the same span of time we changed the GID of the user
> experiencing the fault, but we think this is not relevant because the
> same SIGSEGV happens to another user which was not modified. Moreover
> we cannot see how that change can affect the running executabe (we
> checked all file permissions in the directory tree where the program
> is used).
>
> Running the program under GDB we get the trace reported below. The
> segfault happens at the very beginning during MPI initialization.
>
> We can use the program with sudo, but I'd like to find out what
> happened to go back to "normal" usage.
>
> I'd appreciate any hint on the issue.
>
> Many thanks,
>
>                            Luca Fini
>
> ==============================
> Here follows a few environment details:
>
> Program started with: mpirun -debug -debugger gdb  -np 1
>
> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
>
> OPEN-MPI 1.6.5
>
> Linux 2.6.32-431.29.2.2.6.32-431.29.2.el6.x86_64
>
> Intel fortran Compiler: 2011.7.256
>
> =========================
> Here follows the stack trace:
>
> Starting program:
>
> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
>
> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
> [Thread debugging using libthread_db enabled]
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00002aaaaaf652c7 in mca_base_component_find (directory=0x0,
> type=0x3b914a7fb5 "rte", static_components=0x3b916cb040,
> requested_component_names=0x0, include_mode=128, found_components=0x1,
> open_dso_components=16)
>     at mca_base_component_find.c:162
> 162        OBJ_CONSTRUCT(found_components, opal_list_t);
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.149.el6.x86_64 libgcc-4.4.7-11.el6.x86_64
> libgfortran-4.4.7-11.el6.x86_64 libtool-ltdl-2.2.6-15.5.el6.x86_64
> openmpi-1.8.1-1.el6.x86_64
> (gdb) where
> #0  0x00002aaaaaf652c7 in mca_base_component_find (directory=0x0,
> type=0x3b914a7fb5 "rte", static_components=0x3b916cb040,
> requested_component_names=0x0, include_mode=128, found_components=0x1,
> open_dso_components=16)
>     at mca_base_component_find.c:162
> #1  0x0000003b90c4870a in mca_base_framework_components_register ()
> from /usr/lib64/openmpi/lib/libopen-pal.so.6
> #2  0x0000003b90c48c06 in mca_base_framework_register () from
> /usr/lib64/openmpi/lib/libopen-pal.so.6
> #3  0x0000003b90c48def in mca_base_framework_open () from
> /usr/lib64/openmpi/lib/libopen-pal.so.6
> #4  0x0000003b914407e7 in ompi_mpi_init () from
> /usr/lib64/openmpi/lib/libmpi.so.1
> #5  0x0000003b91463200 in PMPI_Init () from
> /usr/lib64/openmpi/lib/libmpi.so.1
> #6  0x00002aaaaacd9295 in mpi_init_f (ierr=0x7fffffffd268) at pinit_f.c:75
> #7  0x00000000005bb159 in MODE_MNH_WORLD::init_nmnh_comm_world
> (kinfo_ll=Cannot access memory at address 0x0
> ) at
> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_mode_mnh_world.f90:45
> #8  0x00000000005939d3 in MODE_IO_LL::initio_ll () at
>
> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_mode_io_ll.f90:107
> #9  0x000000000049d02f in prep_pgd () at
>
> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_prep_pgd.f90:130
> #10 0x000000000049cf8c in main ()
>
> --
> Luca Fini.  INAF - Oss. Astrofisico di Arcetri
> L.go E.Fermi, 5. 50125 Firenze. Italy
> Tel: +39 055 2752 307     Fax: +39 055 2752 292
> Skype: l.fini
> Web: http://www.arcetri.inaf.it/~lfini
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25945.php
>

Reply via email to