Luca,

you might want to double check the environment :
env | grep ^OMPI
and the per user config
ls $HOME/.openmpi

Cheers,

Gilles

On 2014/12/11 17:40, Luca Fini wrote:
> Many thanks for the replies.
>
> The mismatch in OpeMPI version is my fault: while writing the request
> for help I looked at the name of the directory where OpenMPI was built
> (I did not build it myself) and did not notice that the name of the
> directory did not reflect the version actually compiled.
>
> I had already checked the ulimits defined for the account where the
> SIGSEGV happens and they seems OK.
>
> Moreover I have a further result: I created a brand new account with
> default privileges and tried to run the program under that one, and it
> works!
>
> I'm still trying to spot out the differences between the two
> unprivileged accounts.
>
> Cheers,
>                            l.
>
> On Wed, Dec 10, 2014 at 6:12 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
>> Hi Luca
>>
>> Another possibility that comes to mind,
>> besides mixed versions mentioned by Gilles,
>> is the OS limits.
>> Limits may vary according to the user and user privileges.
>>
>> Large programs tend to require big stacksize (even unlimited),
>> and typically segfault when the stack is not large enough.
>> Max number of open files is yet another hurdle.
>> And if you're using Infinband, the max locked memory size should be
>> unlimited.
>> Check /etc/security/limits.conf and "ulimit -a".
>>
>> I hope this helps,
>> Gus Correa
>>
>> On 12/10/2014 08:28 AM, Gilles Gouaillardet wrote:
>>> Luca,
>>>
>>> your email mentions openmpi 1.6.5
>>> but gdb output points to openmpi 1.8.1.
>>>
>>> could the root cause be a mix of versions that does not occur with root
>>> account ?
>>>
>>> which openmpi version are you expecting ?
>>>
>>> you can run
>>> pmap <pid>
>>> when your binary is running and/or under gdb to confirm the openmpi
>>> library that is really used
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Wed, Dec 10, 2014 at 7:21 PM, Luca Fini <lf...@arcetri.astro.it
>>> <mailto:lf...@arcetri.astro.it>> wrote:
>>>
>>>     I've a problem running a well tested MPI based application.
>>>
>>>     The program has been used for years with no problems. Suddenly the
>>>     executable which was run many times with no problems crashed with
>>>     SIGSEGV. The very same executable if run with root privileges works
>>>     OK. The same happens with other executables and across various
>>>     recompilation attempts.
>>>
>>>     We could not find any relevant difference in the O.S. since a few days
>>>     ago when the program worked also under unprivileged user ID. Actually
>>>     about in the same span of time we changed the GID of the user
>>>     experiencing the fault, but we think this is not relevant because the
>>>     same SIGSEGV happens to another user which was not modified. Moreover
>>>     we cannot see how that change can affect the running executabe (we
>>>     checked all file permissions in the directory tree where the program
>>>     is used).
>>>
>>>     Running the program under GDB we get the trace reported below. The
>>>     segfault happens at the very beginning during MPI initialization.
>>>
>>>     We can use the program with sudo, but I'd like to find out what
>>>     happened to go back to "normal" usage.
>>>
>>>     I'd appreciate any hint on the issue.
>>>
>>>     Many thanks,
>>>
>>>                                 Luca Fini
>>>
>>>     ==============================
>>>     Here follows a few environment details:
>>>
>>>     Program started with: mpirun -debug -debugger gdb  -np 1
>>>
>>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
>>>
>>>     OPEN-MPI 1.6.5
>>>
>>>     Linux 2.6.32-431.29.2.2.6.32-431.29.2.el6.x86_64
>>>
>>>     Intel fortran Compiler: 2011.7.256
>>>
>>>     =========================
>>>     Here follows the stack trace:
>>>
>>>     Starting program:
>>>
>>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
>>>
>>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
>>>     [Thread debugging using libthread_db enabled]
>>>
>>>     Program received signal SIGSEGV, Segmentation fault.
>>>     0x00002aaaaaf652c7 in mca_base_component_find (directory=0x0,
>>>     type=0x3b914a7fb5 "rte", static_components=0x3b916cb040,
>>>     requested_component_names=0x0, include_mode=128, found_components=0x1,
>>>     open_dso_components=16)
>>>          at mca_base_component_find.c:162
>>>     162        OBJ_CONSTRUCT(found_components, opal_list_t);
>>>     Missing separate debuginfos, use: debuginfo-install
>>>     glibc-2.12-1.149.el6.x86_64 libgcc-4.4.7-11.el6.x86_64
>>>     libgfortran-4.4.7-11.el6.x86_64 libtool-ltdl-2.2.6-15.5.el6.x86_64
>>>     openmpi-1.8.1-1.el6.x86_64
>>>     (gdb) where
>>>     #0  0x00002aaaaaf652c7 in mca_base_component_find (directory=0x0,
>>>     type=0x3b914a7fb5 "rte", static_components=0x3b916cb040,
>>>     requested_component_names=0x0, include_mode=128, found_components=0x1,
>>>     open_dso_components=16)
>>>          at mca_base_component_find.c:162
>>>     #1  0x0000003b90c4870a in mca_base_framework_components_register ()
>>>     from /usr/lib64/openmpi/lib/libopen-pal.so.6
>>>     #2  0x0000003b90c48c06 in mca_base_framework_register () from
>>>     /usr/lib64/openmpi/lib/libopen-pal.so.6
>>>     #3  0x0000003b90c48def in mca_base_framework_open () from
>>>     /usr/lib64/openmpi/lib/libopen-pal.so.6
>>>     #4  0x0000003b914407e7 in ompi_mpi_init () from
>>>     /usr/lib64/openmpi/lib/libmpi.so.1
>>>     #5  0x0000003b91463200 in PMPI_Init () from
>>>     /usr/lib64/openmpi/lib/libmpi.so.1
>>>     #6  0x00002aaaaacd9295 in mpi_init_f (ierr=0x7fffffffd268) at
>>>     pinit_f.c:75
>>>     #7  0x00000000005bb159 in MODE_MNH_WORLD::init_nmnh_comm_world
>>>     (kinfo_ll=Cannot access memory at address 0x0
>>>     ) at
>>>
>>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_mode_mnh_world.f90:45
>>>     #8  0x00000000005939d3 in MODE_IO_LL::initio_ll () at
>>>
>>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_mode_io_ll.f90:107
>>>     #9  0x000000000049d02f in prep_pgd () at
>>>
>>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_prep_pgd.f90:130
>>>     #10 0x000000000049cf8c in main ()
>>>
>>>     --
>>>     Luca Fini.  INAF - Oss. Astrofisico di Arcetri
>>>     L.go E.Fermi, 5. 50125 Firenze. Italy
>>>     Tel: +39 055 2752 307 <tel:%2B39%20055%202752%20307>     Fax: +39
>>>     055 2752 292 <tel:%2B39%20055%202752%20292>
>>>     Skype: l.fini
>>>     Web: http://www.arcetri.inaf.it/~lfini
>>>     _______________________________________________
>>>     users mailing list
>>>     us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>     Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>     Link to this post:
>>>     http://www.open-mpi.org/community/lists/users/2014/12/25945.php
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/12/25946.php
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/25950.php
>
>

Reply via email to