[OMPI users] mpi and gromacs

2018-07-11 Thread Mahmood Naderan
Hi
Although not directly related to ompi, I would like to know if anybody uses
gromcas with mpi support? The binary is gmx_mpi and it has some options for
threading. However, I am also able to run that by using running mpirun
before gmx_mpi.


So, it is possible to run

gmx_mpi 

and

mpirun -np 4 gmx_mpi

Is the second command OK? Seems to be a two layer mpi calls which may
degrade the performance.

Any thoughts?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Noam Bernstein
> On Jul 10, 2018, at 5:15 PM, Noam Bernstein  
> wrote:
> 
> 
> 
> What are useful steps I can do to debug?  Recompile with —enable-debug?  Are 
> there any other versions that are worth trying?  I don’t recall this error 
> happening before we switched to 3.1.0.
> 
>   thanks,
>   Noam


It appears that the problem is there with OpenMPI 3.1.1, but not 2.1.3. Of 
course I can’t be 100% sure, since it’s non deterministic, but 3 runs died 
after 0-3 iterations with 3.1.1, and did 3 runs with 10 iterations each with 
2.1.3.


Noam


||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Noam Bernstein

> On Jul 11, 2018, at 9:58 AM, Noam Bernstein  
> wrote:
> 
>> On Jul 10, 2018, at 5:15 PM, Noam Bernstein > > wrote:
>> 
>> 
>> 
>> What are useful steps I can do to debug?  Recompile with —enable-debug?  Are 
>> there any other versions that are worth trying?  I don’t recall this error 
>> happening before we switched to 3.1.0.
>> 
>>  thanks,
>>  Noam
> 
> 
> It appears that the problem is there with OpenMPI 3.1.1, but not 2.1.3. Of 
> course I can’t be 100% sure, since it’s non deterministic, but 3 runs died 
> after 0-3 iterations with 3.1.1, and did 3 runs with 10 iterations each with 
> 2.1.3.

After more extensive testing it’s clear that it still happens with 2.1.3, but 
much less frequently.  I’m going to try to get more detailed info with version 
3.1.1, where it’s easier to reproduce.

Noam




||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Jeff Squyres (jsquyres) via users
Ok, that would be great -- thanks.

Recompiling Open MPI with --enable-debug will turn on several debugging/sanity 
checks inside Open MPI, and it will also enable debugging symbols.  Hence, If 
you can get a failure when a debug Open MPI build, it might give you a core 
file that can be used to get a more detailed stack trace, poke around and see 
if there's a NULL pointer somewhere, ...etc.


> On Jul 11, 2018, at 11:03 AM, Noam Bernstein  
> wrote:
> 
>> 
>> On Jul 11, 2018, at 9:58 AM, Noam Bernstein  
>> wrote:
>> 
>>> On Jul 10, 2018, at 5:15 PM, Noam Bernstein  
>>> wrote:
>>> 
>>> 
>>> 
>>> What are useful steps I can do to debug?  Recompile with —enable-debug?  
>>> Are there any other versions that are worth trying?  I don’t recall this 
>>> error happening before we switched to 3.1.0.
>>> 
>>> thanks,
>>> Noam
>> 
>> It appears that the problem is there with OpenMPI 3.1.1, but not 2.1.3. Of 
>> course I can’t be 100% sure, since it’s non deterministic, but 3 runs died 
>> after 0-3 iterations with 3.1.1, and did 3 runs with 10 iterations each with 
>> 2.1.3.
> 
> After more extensive testing it’s clear that it still happens with 2.1.3, but 
> much less frequently.  I’m going to try to get more detailed info with 
> version 3.1.1, where it’s easier to reproduce.
> 
>   Noam
> 
> 
> 
> 
> ||
> |U.S. NAVAL|
> |_RESEARCH_|
> LABORATORY
> 
> Noam Bernstein, Ph.D.
> Center for Materials Physics and Technology
> U.S. Naval Research Laboratory
> T +1 202 404 8628  F +1 202 404 7546
> https://www.nrl.navy.mil
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Noam Bernstein
> On Jul 11, 2018, at 11:29 AM, Jeff Squyres (jsquyres) via users 
>  wrote:
> 
> Ok, that would be great -- thanks.
> 
> Recompiling Open MPI with --enable-debug will turn on several 
> debugging/sanity checks inside Open MPI, and it will also enable debugging 
> symbols.  Hence, If you can get a failure when a debug Open MPI build, it 
> might give you a core file that can be used to get a more detailed stack 
> trace, poke around and see if there's a NULL pointer somewhere, …etc.

I haven’t tried to get a core file yes, but it’s not producing any more info 
from the runtime stack trace, despite configure with —enable-debug:

Image  PCRoutineLineSource
vasp.gamma_para.i  02DCE8C1  Unknown   Unknown  Unknown
vasp.gamma_para.i  02DCC9FB  Unknown   Unknown  Unknown
vasp.gamma_para.i  02D409E4  Unknown   Unknown  Unknown
vasp.gamma_para.i  02D407F6  Unknown   Unknown  Unknown
vasp.gamma_para.i  02CDCED9  Unknown   Unknown  Unknown
vasp.gamma_para.i  02CE3DB6  Unknown   Unknown  Unknown
libpthread-2.12.s  003F8E60F7E0  Unknown   Unknown  Unknown
mca_btl_vader.so   2B1AFA5FAC30  Unknown   Unknown  Unknown
mca_btl_vader.so   2B1AFA5FD00D  Unknown   Unknown  Unknown
libopen-pal.so.40  2B1AE884327C  opal_progress Unknown  Unknown
mca_pml_ob1.so 2B1AFB855DCE  Unknown   Unknown  Unknown
mca_pml_ob1.so 2B1AFB858305  mca_pml_ob1_send  Unknown  Unknown
libmpi.so.40.10.1  2B1AE823A5DA  ompi_coll_base_al Unknown  Unknown
mca_coll_tuned.so  2B1AFC6F0842  ompi_coll_tuned_a Unknown  Unknown
libmpi.so.40.10.1  2B1AE81B66F5  PMPI_AllreduceUnknown  Unknown
libmpi_mpifh.so.4  2B1AE7F2259B  mpi_allreduce_Unknown  Unknown
vasp.gamma_para.i  0042D1ED  m_sum_d_ 1300  mpi.F
vasp.gamma_para.i  0089947D  nonl_mp_vnlacc_.R1754  nonl.F
vasp.gamma_para.i  00972C51  hamil_mp_hamiltmu 825  hamil.F
vasp.gamma_para.i  01BD2608  david_mp_eddav_.R 419  davidson.F
vasp.gamma_para.i  01D2179E  elmin_.R  424  electron.F
vasp.gamma_para.i  02B92452  vamp_IP_electroni4783  main.F
vasp.gamma_para.i  02B6E173  MAIN__   2800  main.F
vasp.gamma_para.i  0041325E  Unknown   Unknown  Unknown
libc-2.12.so   003F8E21ED1D  __libc_start_main Unknown  Unknown
vasp.gamma_para.i  00413169  Unknown   Unknown  Unknown

This is the configure line that was supposedly used to create the library:
./configure 
--prefix=/usr/local/openmpi/3.1.1_debug/x86_64/ib/intel/11.1.080 
--with-tm=/usr/local/torque --enable-mpirun-prefix-by-default --with-verbs=/usr 
--with-verbs-libdir=/usr/lib64 --enable-debug

Is there any way I can confirm that the version of the openmpi library I think 
I’m using really was compiled with debugging?


Noam


||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY

Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Noam Bernstein
> On Jul 11, 2018, at 11:29 AM, Jeff Squyres (jsquyres) via users 
>  wrote:
>>> 
>> 
>> After more extensive testing it’s clear that it still happens with 2.1.3, 
>> but much less frequently.  I’m going to try to get more detailed info with 
>> version 3.1.1, where it’s easier to reproduce.

objdump —debugging produces output consistent with no debugging symbols in the 
library so files:
tin 1061 : objdump --debugging 
/usr/local/openmpi/3.1.1_debug/x86_64/ib/intel/11.1.080/lib/libmpi.so.40

/usr/local/openmpi/3.1.1_debug/x86_64/ib/intel/11.1.080/lib/libmpi.so.40: 
file format elf64-x86-64


Noam___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Jeff Squyres (jsquyres) via users
$ ompi_info | grep -i debug
  Configure command line: '--prefix=/home/jsquyres/bogus' '--with-usnic' 
'--with-libfabric=/home/jsquyres/libfabric-current/install' 
'--enable-mpirun-prefix-by-default' '--enable-debug' '--enable-mem-debug' 
'--enable-mem-profile' '--disable-mpi-fortran' '--enable-debug' 
'--enable-mem-debug' '--enable-picky'
  Internal debug support: yes
Memory debugging support: yes
   C/R Enabled Debugging: no

That should tell you whether you have debug support or not.


> On Jul 11, 2018, at 5:25 PM, Noam Bernstein  
> wrote:
> 
>> On Jul 11, 2018, at 11:29 AM, Jeff Squyres (jsquyres) via users 
>>  wrote:
 
>>> 
>>> After more extensive testing it’s clear that it still happens with 2.1.3, 
>>> but much less frequently.  I’m going to try to get more detailed info with 
>>> version 3.1.1, where it’s easier to reproduce.
> 
> objdump —debugging produces output consistent with no debugging symbols in 
> the library so files:
> tin 1061 : objdump --debugging 
> /usr/local/openmpi/3.1.1_debug/x86_64/ib/intel/11.1.080/lib/libmpi.so.40
> 
> /usr/local/openmpi/3.1.1_debug/x86_64/ib/intel/11.1.080/lib/libmpi.so.40: 
> file format elf64-x86-64
> 
> 
>   Noam


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Nathan Hjelm via users

Might be also worth testing a master snapshot and see if that fixes the issue. 
There are a couple of fixes being backported from master to v3.0.x and v3.1.x 
now.

-Nathan

On Jul 11, 2018, at 03:16 PM, Noam Bernstein  
wrote:

On Jul 11, 2018, at 11:29 AM, Jeff Squyres (jsquyres) via users 
 wrote:
Ok, that would be great -- thanks.

Recompiling Open MPI with --enable-debug will turn on several debugging/sanity 
checks inside Open MPI, and it will also enable debugging symbols.  Hence, If 
you can get a failure when a debug Open MPI build, it might give you a core 
file that can be used to get a more detailed stack trace, poke around and see 
if there's a NULL pointer somewhere, …etc.

I haven’t tried to get a core file yes, but it’s not producing any more info 
from the runtime stack trace, despite configure with —enable-debug:

Image              PC                Routine            Line        Source
vasp.gamma_para.i  02DCE8C1  Unknown               Unknown  Unknown
vasp.gamma_para.i  02DCC9FB  Unknown               Unknown  Unknown
vasp.gamma_para.i  02D409E4  Unknown               Unknown  Unknown
vasp.gamma_para.i  02D407F6  Unknown               Unknown  Unknown
vasp.gamma_para.i  02CDCED9  Unknown               Unknown  Unknown
vasp.gamma_para.i  02CE3DB6  Unknown               Unknown  Unknown
libpthread-2.12.s  003F8E60F7E0  Unknown               Unknown  Unknown
mca_btl_vader.so   2B1AFA5FAC30  Unknown               Unknown  Unknown
mca_btl_vader.so   2B1AFA5FD00D  Unknown               Unknown  Unknown
libopen-pal.so.40  2B1AE884327C  opal_progress         Unknown  Unknown
mca_pml_ob1.so     2B1AFB855DCE  Unknown               Unknown  Unknown
mca_pml_ob1.so     2B1AFB858305  mca_pml_ob1_send      Unknown  Unknown
libmpi.so.40.10.1  2B1AE823A5DA  ompi_coll_base_al     Unknown  Unknown
mca_coll_tuned.so  2B1AFC6F0842  ompi_coll_tuned_a     Unknown  Unknown
libmpi.so.40.10.1  2B1AE81B66F5  PMPI_Allreduce        Unknown  Unknown
libmpi_mpifh.so.4  2B1AE7F2259B  mpi_allreduce_        Unknown  Unknown
vasp.gamma_para.i  0042D1ED  m_sum_d_                 1300  mpi.F
vasp.gamma_para.i  0089947D  nonl_mp_vnlacc_.R        1754  nonl.F
vasp.gamma_para.i  00972C51  hamil_mp_hamiltmu         825  hamil.F
vasp.gamma_para.i  01BD2608  david_mp_eddav_.R         419  davidson.F
vasp.gamma_para.i  01D2179E  elmin_.R                  424  electron.F
vasp.gamma_para.i  02B92452  vamp_IP_electroni        4783  main.F
vasp.gamma_para.i  02B6E173  MAIN__                   2800  main.F
vasp.gamma_para.i  0041325E  Unknown               Unknown  Unknown
libc-2.12.so       003F8E21ED1D  __libc_start_main     Unknown  Unknown
vasp.gamma_para.i  00413169  Unknown               Unknown  Unknown

This is the configure line that was supposedly used to create the library:
  ./configure --prefix=/usr/local/openmpi/3.1.1_debug/x86_64/ib/intel/11.1.080 
--with-tm=/usr/local/torque --enable-mpirun-prefix-by-default --with-verbs=/usr 
--with-verbs-libdir=/usr/lib64 --enable-debug

Is there any way I can confirm that the version of the openmpi library I think 
I’m using really was compiled with debugging?

Noam




|
|

|U.S. NAVAL|

|_RESEARCH_|


LABORATORY


Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Ben Menadue
Hi,

Perhaps related — we’re seeing this one with 3.1.1. I’ll see if I can get the 
application run against our --enable-debug build.

Cheers,
Ben

[raijin7:1943 :0:1943] Caught signal 11 (Segmentation fault: address not mapped 
to object at address 0x45)

/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_recvfrag.c:
 [ append_frag_to_ordered_list() ]
 ...
 118  * account for this rollover or the matching will fail.
 119  * Extract the items from the list to order them safely */
 120 if( hdr->hdr_seq < prior->hdr.hdr_match.hdr_seq ) {
==>   121 uint16_t d1, d2 = prior->hdr.hdr_match.hdr_seq - hdr->hdr_seq;
 122 do {
 123 d1 = d2;
 124 prior = 
(mca_pml_ob1_recv_frag_t*)(prior->super.super.opal_list_prev);

 backtrace 
0 0x00012d5f append_frag_to_ordered_list()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_recvfrag.c:121
1 0x00013a06 mca_pml_ob1_recv_frag_callback_match()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_recvfrag.c:390
2 0x44ef mca_btl_vader_check_fboxes()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/opal/mca/btl/vader/../../../../../../../opal/mca/btl/vader/btl_vader_fbox.h:208
3 0x602f mca_btl_vader_component_progress()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/opal/mca/btl/vader/../../../../../../../opal/mca/btl/vader/btl_vader_component.c:689
4 0x0002b554 opal_progress()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/opal/../../../../opal/runtime/opal_progress.c:228
5 0x000331cc ompi_sync_wait_mt()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/opal/../../../../opal/threads/wait_sync.c:85
6 0x0004a989 ompi_request_wait_completion()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/../../../../ompi/request/request.h:403
7 0x0004aa1d ompi_request_default_wait()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/../../../../ompi/request/req_wait.c:42
8 0x000d3486 ompi_coll_base_sendrecv_actual()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_util.c:59
9 0x000d0d2b ompi_coll_base_sendrecv()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_util.h:67
10 0x000d14c7 ompi_coll_base_allgather_intra_recursivedoubling()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_allgather.c:329
11 0x56dc ompi_coll_tuned_allgather_intra_dec_fixed()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mca/coll/tuned/../../../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:551
12 0x0006185d PMPI_Allgather()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mpi/c/profile/pallgather.c:122
13 0x0004362c ompi_allgather_f()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/intel/debug-0/ompi/mpi/fortran/mpif-h/profile/pallgather_f.c:86
14 0x005ed3cb comms_allgather_integer_0()  
/short/z00/aab900/onetep/src/comms_mod.F90:14795
15 0x01309fe1 multigrid_bc_for_dlmg()  
/short/z00/aab900/onetep/src/multigrid_methods_mod.F90:270
16 0x01309fe1 multigrid_initialise()  
/short/z00/aab900/onetep/src/multigrid_methods_mod.F90:174
17 0x00f0c885 hartree_via_multigrid()  
/short/z00/aab900/onetep/src/hartree_mod.F90:181
18 0x00a0c62a electronic_init_pot()  
/short/z00/aab900/onetep/src/electronic_init_mod.F90:1123
19 0x00a14d62 electronic_init_denskern()  
/short/z00/aab900/onetep/src/electronic_init_mod.F90:334
20 0x00a50136 energy_and_force_calculate()  
/short/z00/aab900/onetep/src/energy_and_force_mod.F90:1702
21 0x014f46e7 onetep()  /short/z00/aab900/onetep/src/onetep.F90:277
22 0x0041465e main()  ???:0
23 0x0001ed1d __libc_start_main()  ???:0
24 0x00414569 _start()  ???:0
===
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
---
forrtl: error (78): process killed (SIGTERM)
Image  PCRoutineLineSource  
   
onetep.nci 01DCC6DE  Unknown   Unknown  Unknown
libpthread-2.12.s  2B6

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Ben Menadue
Here’s what happens using a debug build:

[raijin7:5] ompi_comm_peer_lookup: invalid peer index (2)
[raijin7:5:0:5] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x8)

/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_comm.h:
 [ mca_pml_ob1_peer_lookup() ]
 ...
  75 mca_pml_ob1_comm_proc_t* proc = 
OBJ_NEW(mca_pml_ob1_comm_proc_t);
  76 proc->ompi_proc = ompi_comm_peer_lookup (comm, rank);
  77 OBJ_RETAIN(proc->ompi_proc);
==>78 opal_atomic_wmb ();
  79 pml_comm->procs[rank] = proc;
  80 }
  81 OPAL_THREAD_UNLOCK(&pml_comm->proc_lock);

 backtrace 
0 0x00017505 mca_pml_ob1_peer_lookup()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_comm.h:78
1 0x00019119 mca_pml_ob1_recv_frag_callback_match()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_recvfrag.c:361
2 0x52d7 mca_btl_vader_check_fboxes()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/opal/mca/btl/vader/../../../../../../../opal/mca/btl/vader/btl_vader_fbox.h:208
3 0x77fd mca_btl_vader_component_progress()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/opal/mca/btl/vader/../../../../../../../opal/mca/btl/vader/btl_vader_component.c:689
4 0x0002ff90 opal_progress()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/opal/../../../../opal/runtime/opal_progress.c:228
5 0x0003b168 ompi_sync_wait_mt()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/opal/../../../../opal/threads/wait_sync.c:85
6 0x0005cd64 ompi_request_wait_completion()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/../../../../ompi/request/request.h:403
7 0x0005ce28 ompi_request_default_wait()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/../../../../ompi/request/req_wait.c:42
8 0x001142d9 ompi_coll_base_sendrecv_zero()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_barrier.c:64
9 0x00114763 ompi_coll_base_barrier_intra_recursivedoubling()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_barrier.c:215
10 0x4cad ompi_coll_tuned_barrier_intra_dec_fixed()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/mca/coll/tuned/../../../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:212
11 0x000831ac PMPI_Barrier()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-1/ompi/mpi/c/profile/pbarrier.c:63
12 0x00044041 ompi_barrier_f()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/intel/debug-1/ompi/mpi/fortran/mpif-h/profile/pbarrier_f.c:76
13 0x005c79de comms_barrier()  
/short/z00/aab900/onetep/src/comms_mod.F90:1543
14 0x005c79de comms_bcast_logical_0()  
/short/z00/aab900/onetep/src/comms_mod.F90:10756
15 0x01c21509 utils_devel_code_logical()  
/short/z00/aab900/onetep/src/utils_mod.F90:2646
16 0x01309ddb multigrid_bc_for_dlmg()  
/short/z00/aab900/onetep/src/multigrid_methods_mod.F90:260
17 0x01309ddb multigrid_initialise()  
/short/z00/aab900/onetep/src/multigrid_methods_mod.F90:174
18 0x00f0c885 hartree_via_multigrid()  
/short/z00/aab900/onetep/src/hartree_mod.F90:181
19 0x00a0c62a electronic_init_pot()  
/short/z00/aab900/onetep/src/electronic_init_mod.F90:1123
20 0x00a14d62 electronic_init_denskern()  
/short/z00/aab900/onetep/src/electronic_init_mod.F90:334
21 0x00a50136 energy_and_force_calculate()  
/short/z00/aab900/onetep/src/energy_and_force_mod.F90:1702
22 0x014f46e7 onetep()  /short/z00/aab900/onetep/src/onetep.F90:277
23 0x0041465e main()  ???:0
24 0x0001ed1d __libc_start_main()  ???:0
25 0x00414569 _start()  ???:0
===



> On 12 Jul 2018, at 1:36 pm, Ben Menadue  wrote:
> 
> Hi,
> 
> Perhaps related — we’re seeing this one with 3.1.1. I’ll see if I can get the 
> application run against our --enable-debug build.
> 
> Cheers,
> Ben
> 
> [raijin7:1943 :0:1943] Caught signal 11 (Segmentation fault: address not 
> mapped to object at address 0x45)
> 
> /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.1/build/gcc/debug-0/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_recvfrag.c:
>  [ append_frag_to_ordered_list() ]
>  ...
>  118  * account for this rollover or the matching will fail.
>