Luca, you might want to double check the environment : env | grep ^OMPI and the per user config ls $HOME/.openmpi
Cheers, Gilles On 2014/12/11 17:40, Luca Fini wrote: > Many thanks for the replies. > > The mismatch in OpeMPI version is my fault: while writing the request > for help I looked at the name of the directory where OpenMPI was built > (I did not build it myself) and did not notice that the name of the > directory did not reflect the version actually compiled. > > I had already checked the ulimits defined for the account where the > SIGSEGV happens and they seems OK. > > Moreover I have a further result: I created a brand new account with > default privileges and tried to run the program under that one, and it > works! > > I'm still trying to spot out the differences between the two > unprivileged accounts. > > Cheers, > l. > > On Wed, Dec 10, 2014 at 6:12 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: >> Hi Luca >> >> Another possibility that comes to mind, >> besides mixed versions mentioned by Gilles, >> is the OS limits. >> Limits may vary according to the user and user privileges. >> >> Large programs tend to require big stacksize (even unlimited), >> and typically segfault when the stack is not large enough. >> Max number of open files is yet another hurdle. >> And if you're using Infinband, the max locked memory size should be >> unlimited. >> Check /etc/security/limits.conf and "ulimit -a". >> >> I hope this helps, >> Gus Correa >> >> On 12/10/2014 08:28 AM, Gilles Gouaillardet wrote: >>> Luca, >>> >>> your email mentions openmpi 1.6.5 >>> but gdb output points to openmpi 1.8.1. >>> >>> could the root cause be a mix of versions that does not occur with root >>> account ? >>> >>> which openmpi version are you expecting ? >>> >>> you can run >>> pmap <pid> >>> when your binary is running and/or under gdb to confirm the openmpi >>> library that is really used >>> >>> Cheers, >>> >>> Gilles >>> >>> On Wed, Dec 10, 2014 at 7:21 PM, Luca Fini <lf...@arcetri.astro.it >>> <mailto:lf...@arcetri.astro.it>> wrote: >>> >>> I've a problem running a well tested MPI based application. >>> >>> The program has been used for years with no problems. Suddenly the >>> executable which was run many times with no problems crashed with >>> SIGSEGV. The very same executable if run with root privileges works >>> OK. The same happens with other executables and across various >>> recompilation attempts. >>> >>> We could not find any relevant difference in the O.S. since a few days >>> ago when the program worked also under unprivileged user ID. Actually >>> about in the same span of time we changed the GID of the user >>> experiencing the fault, but we think this is not relevant because the >>> same SIGSEGV happens to another user which was not modified. Moreover >>> we cannot see how that change can affect the running executabe (we >>> checked all file permissions in the directory tree where the program >>> is used). >>> >>> Running the program under GDB we get the trace reported below. The >>> segfault happens at the very beginning during MPI initialization. >>> >>> We can use the program with sudo, but I'd like to find out what >>> happened to go back to "normal" usage. >>> >>> I'd appreciate any hint on the issue. >>> >>> Many thanks, >>> >>> Luca Fini >>> >>> ============================== >>> Here follows a few environment details: >>> >>> Program started with: mpirun -debug -debugger gdb -np 1 >>> >>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD >>> >>> OPEN-MPI 1.6.5 >>> >>> Linux 2.6.32-431.29.2.2.6.32-431.29.2.el6.x86_64 >>> >>> Intel fortran Compiler: 2011.7.256 >>> >>> ========================= >>> Here follows the stack trace: >>> >>> Starting program: >>> >>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD >>> >>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD >>> [Thread debugging using libthread_db enabled] >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> 0x00002aaaaaf652c7 in mca_base_component_find (directory=0x0, >>> type=0x3b914a7fb5 "rte", static_components=0x3b916cb040, >>> requested_component_names=0x0, include_mode=128, found_components=0x1, >>> open_dso_components=16) >>> at mca_base_component_find.c:162 >>> 162 OBJ_CONSTRUCT(found_components, opal_list_t); >>> Missing separate debuginfos, use: debuginfo-install >>> glibc-2.12-1.149.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 >>> libgfortran-4.4.7-11.el6.x86_64 libtool-ltdl-2.2.6-15.5.el6.x86_64 >>> openmpi-1.8.1-1.el6.x86_64 >>> (gdb) where >>> #0 0x00002aaaaaf652c7 in mca_base_component_find (directory=0x0, >>> type=0x3b914a7fb5 "rte", static_components=0x3b916cb040, >>> requested_component_names=0x0, include_mode=128, found_components=0x1, >>> open_dso_components=16) >>> at mca_base_component_find.c:162 >>> #1 0x0000003b90c4870a in mca_base_framework_components_register () >>> from /usr/lib64/openmpi/lib/libopen-pal.so.6 >>> #2 0x0000003b90c48c06 in mca_base_framework_register () from >>> /usr/lib64/openmpi/lib/libopen-pal.so.6 >>> #3 0x0000003b90c48def in mca_base_framework_open () from >>> /usr/lib64/openmpi/lib/libopen-pal.so.6 >>> #4 0x0000003b914407e7 in ompi_mpi_init () from >>> /usr/lib64/openmpi/lib/libmpi.so.1 >>> #5 0x0000003b91463200 in PMPI_Init () from >>> /usr/lib64/openmpi/lib/libmpi.so.1 >>> #6 0x00002aaaaacd9295 in mpi_init_f (ierr=0x7fffffffd268) at >>> pinit_f.c:75 >>> #7 0x00000000005bb159 in MODE_MNH_WORLD::init_nmnh_comm_world >>> (kinfo_ll=Cannot access memory at address 0x0 >>> ) at >>> >>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_mode_mnh_world.f90:45 >>> #8 0x00000000005939d3 in MODE_IO_LL::initio_ll () at >>> >>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_mode_io_ll.f90:107 >>> #9 0x000000000049d02f in prep_pgd () at >>> >>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_prep_pgd.f90:130 >>> #10 0x000000000049cf8c in main () >>> >>> -- >>> Luca Fini. INAF - Oss. Astrofisico di Arcetri >>> L.go E.Fermi, 5. 50125 Firenze. Italy >>> Tel: +39 055 2752 307 <tel:%2B39%20055%202752%20307> Fax: +39 >>> 055 2752 292 <tel:%2B39%20055%202752%20292> >>> Skype: l.fini >>> Web: http://www.arcetri.inaf.it/~lfini >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/12/25945.php >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/12/25946.php >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25950.php > >