Re: [OMPI users] Using POSIX shared memory as send buffer

2015-09-29 Thread Marcin Krotkiewski

Thanks, Dave.

I have verified the memory locality and IB card locality, all's fine.

Quite accidentally I have found that there is a huge penalty if I mmap 
the shm with PROT_READ only. Using PROT_READ | PROT_WRITE yields good 
results, although I must look at this further. I'll report when I am 
certain, in case sb finds this useful.


Is this an OS feature, or is OpenMPI somehow working differently? I 
don't suspect you guys write to the send buffer, right? Even if you 
would there would be a segfault. So I guess this could be OS preventing 
any writes to the pointer that introduced the overhead?


Marcin



On 09/28/2015 09:44 PM, Dave Goodell (dgoodell) wrote:

On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski  
wrote:

Hello, everyone

I am struggling a bit with IB performance when sending data from a POSIX shared 
memory region (/dev/shm). The memory is shared among many MPI processes within 
the same compute node. Essentially, I see a bit hectic performance, but it 
seems that my code it is roughly twice slower than when using a usual, malloced 
send buffer.

It may have to do with NUMA effects and the way you're allocating/touching your shared 
memory vs. your private (malloced) memory.  If you have a multi-NUMA-domain system (i.e., 
any 2+ socket server, and even some single-socket servers) then you are likely to run 
into this sort of issue.  The PCI bus on which your IB HCA communicates is almost 
certainly closer to one NUMA domain than the others, and performance will usually be 
worse if you are sending/receiving from/to a "remote" NUMA domain.

"lstopo" and other tools can sometimes help you get a handle on the situation, though I don't 
know if it knows how to show memory affinity.  I think you can find memory affinity for a process via 
"/proc//numa_maps".  There's lots of info about NUMA affinity here: 
https://queue.acm.org/detail.cfm?id=2513149

-Dave

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27702.php




Re: [OMPI users] send_request error with allocate

2015-09-29 Thread Diego Avesani
Dear Jeff, Dear all,
the code is very long, here something. I hope that this could help.

What do you think?

SUBROUTINE MATOPQN
USE VARS_COMMON,ONLY:COMM_CART,send_messageR,recv_messageL,nMsg
USE MPI
INTEGER :: send_request(nMsg), recv_request(nMsg)
INTEGER ::
send_status_list(MPI_STATUS_SIZE,nMsg),recv_status_list(MPI_STATUS_SIZE,nMsg)

 !send message to right CPU
IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN
MsgLength = MPIdata%jmaxN
DO icount=1,MPIdata%jmaxN
iNode = MPIdata%nodeList2right(icount)
send_messageR(icount) = RIS_2(iNode)
ENDDO

CALL MPI_ISEND(send_messageR, MsgLength, MPI_DOUBLE_COMPLEX,
MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD,
send_request(MPIdata%rank+1), MPIdata%iErr)

ENDIF
!


!recive message FROM left CPU
IF(MPIdata%rank.NE.0)THEN
MsgLength = MPIdata%jmaxN

CALL MPI_IRECV(recv_messageL, MsgLength, MPI_DOUBLE_COMPLEX,
MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, recv_request(MPIdata%rank),
MPIdata%iErr)

write(*,*) MPIdata%rank-1
ENDIF
!
!
CALL MPI_WAITALL(nMsg,send_request,send_status_list,MPIdata%iErr)
CALL MPI_WAITALL(nMsg,recv_request,recv_status_list,MPIdata%iErr)

Diego


On 29 September 2015 at 00:15, Jeff Squyres (jsquyres) 
wrote:

> Can you send a small reproducer program?
>
> > On Sep 28, 2015, at 4:45 PM, Diego Avesani 
> wrote:
> >
> > Dear all,
> >
> > I have to use a send_request in a MPI_WAITALL.
> > Here the strange things:
> >
> > If I use at the begging of the SUBROUTINE:
> >
> > INTEGER :: send_request(3), recv_request(3)
> >
> > I have no problem, but if I use
> >
> > USE COMONVARS,ONLY : nMsg
> > with nMsg=3
> >
> > and after that I declare
> >
> > INTEGER :: send_request(nMsg), recv_request(nMsg), I get the following
> error:
> >
> > [Lap] *** An error occurred in MPI_Waitall
> > [Lap] *** reported by process [139726485585921,0]
> > [Lap] *** on communicator MPI_COMM_WORLD
> > [Lap] *** MPI_ERR_REQUEST: invalid request
> > [Lap] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
> abort,
> > [Lap] ***and potentially your MPI job)
> > forrtl: error (78): process killed (SIGTERM)
> >
> > Someone could please explain to me where I am wrong?
> >
> > Thanks
> >
> > Diego
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27703.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27704.php
>


Re: [OMPI users] send_request error with allocate

2015-09-29 Thread Diego Avesani
dear Jeff, dear all,
I have notice that if I initialize the variables, I do not have the error
anymore:
!
  ALLOCATE(SEND_REQUEST(nMsg),RECV_REQUEST(nMsg))
  SEND_REQUEST=0
  RECV_REQUEST=0
!

Could you please explain me why?
Thanks


Diego


On 29 September 2015 at 16:08, Diego Avesani 
wrote:

> Dear Jeff, Dear all,
> the code is very long, here something. I hope that this could help.
>
> What do you think?
>
> SUBROUTINE MATOPQN
> USE VARS_COMMON,ONLY:COMM_CART,send_messageR,recv_messageL,nMsg
> USE MPI
> INTEGER :: send_request(nMsg), recv_request(nMsg)
> INTEGER ::
> send_status_list(MPI_STATUS_SIZE,nMsg),recv_status_list(MPI_STATUS_SIZE,nMsg)
>
>  !send message to right CPU
> IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN
> MsgLength = MPIdata%jmaxN
> DO icount=1,MPIdata%jmaxN
> iNode = MPIdata%nodeList2right(icount)
> send_messageR(icount) = RIS_2(iNode)
> ENDDO
>
> CALL MPI_ISEND(send_messageR, MsgLength, MPI_DOUBLE_COMPLEX,
> MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD,
> send_request(MPIdata%rank+1), MPIdata%iErr)
>
> ENDIF
> !
>
>
> !recive message FROM left CPU
> IF(MPIdata%rank.NE.0)THEN
> MsgLength = MPIdata%jmaxN
>
> CALL MPI_IRECV(recv_messageL, MsgLength, MPI_DOUBLE_COMPLEX,
> MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, recv_request(MPIdata%rank),
> MPIdata%iErr)
>
> write(*,*) MPIdata%rank-1
> ENDIF
> !
> !
> CALL MPI_WAITALL(nMsg,send_request,send_status_list,MPIdata%iErr)
> CALL MPI_WAITALL(nMsg,recv_request,recv_status_list,MPIdata%iErr)
>
> Diego
>
>
> On 29 September 2015 at 00:15, Jeff Squyres (jsquyres)  > wrote:
>
>> Can you send a small reproducer program?
>>
>> > On Sep 28, 2015, at 4:45 PM, Diego Avesani 
>> wrote:
>> >
>> > Dear all,
>> >
>> > I have to use a send_request in a MPI_WAITALL.
>> > Here the strange things:
>> >
>> > If I use at the begging of the SUBROUTINE:
>> >
>> > INTEGER :: send_request(3), recv_request(3)
>> >
>> > I have no problem, but if I use
>> >
>> > USE COMONVARS,ONLY : nMsg
>> > with nMsg=3
>> >
>> > and after that I declare
>> >
>> > INTEGER :: send_request(nMsg), recv_request(nMsg), I get the following
>> error:
>> >
>> > [Lap] *** An error occurred in MPI_Waitall
>> > [Lap] *** reported by process [139726485585921,0]
>> > [Lap] *** on communicator MPI_COMM_WORLD
>> > [Lap] *** MPI_ERR_REQUEST: invalid request
>> > [Lap] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
>> abort,
>> > [Lap] ***and potentially your MPI job)
>> > forrtl: error (78): process killed (SIGTERM)
>> >
>> > Someone could please explain to me where I am wrong?
>> >
>> > Thanks
>> >
>> > Diego
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27703.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/09/27704.php
>>
>
>


Re: [OMPI users] send_request error with allocate

2015-09-29 Thread Jeff Squyres (jsquyres)
This code does not appear to compile -- there's no main program, for example.

Can you make a small, self-contained example program that shows the problem?


> On Sep 29, 2015, at 10:08 AM, Diego Avesani  wrote:
> 
> Dear Jeff, Dear all,
> the code is very long, here something. I hope that this could help.
> 
> What do you think?
> 
> SUBROUTINE MATOPQN
> USE VARS_COMMON,ONLY:COMM_CART,send_messageR,recv_messageL,nMsg
> USE MPI
> INTEGER :: send_request(nMsg), recv_request(nMsg)
> INTEGER :: 
> send_status_list(MPI_STATUS_SIZE,nMsg),recv_status_list(MPI_STATUS_SIZE,nMsg)
> 
>  !send message to right CPU
> IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN
> MsgLength = MPIdata%jmaxN
> DO icount=1,MPIdata%jmaxN
> iNode = MPIdata%nodeList2right(icount)
> send_messageR(icount) = RIS_2(iNode)
> ENDDO
> 
> CALL MPI_ISEND(send_messageR, MsgLength, MPI_DOUBLE_COMPLEX, 
> MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, send_request(MPIdata%rank+1), 
> MPIdata%iErr)
> 
> ENDIF
> !
> 
> 
> !recive message FROM left CPU
> IF(MPIdata%rank.NE.0)THEN
> MsgLength = MPIdata%jmaxN
> 
> CALL MPI_IRECV(recv_messageL, MsgLength, MPI_DOUBLE_COMPLEX, 
> MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, recv_request(MPIdata%rank), 
> MPIdata%iErr)
> 
> write(*,*) MPIdata%rank-1
> ENDIF
> !
> !
> CALL MPI_WAITALL(nMsg,send_request,send_status_list,MPIdata%iErr)
> CALL MPI_WAITALL(nMsg,recv_request,recv_status_list,MPIdata%iErr)
> 
> Diego
> 
> 
> On 29 September 2015 at 00:15, Jeff Squyres (jsquyres)  
> wrote:
> Can you send a small reproducer program?
> 
> > On Sep 28, 2015, at 4:45 PM, Diego Avesani  wrote:
> >
> > Dear all,
> >
> > I have to use a send_request in a MPI_WAITALL.
> > Here the strange things:
> >
> > If I use at the begging of the SUBROUTINE:
> >
> > INTEGER :: send_request(3), recv_request(3)
> >
> > I have no problem, but if I use
> >
> > USE COMONVARS,ONLY : nMsg
> > with nMsg=3
> >
> > and after that I declare
> >
> > INTEGER :: send_request(nMsg), recv_request(nMsg), I get the following 
> > error:
> >
> > [Lap] *** An error occurred in MPI_Waitall
> > [Lap] *** reported by process [139726485585921,0]
> > [Lap] *** on communicator MPI_COMM_WORLD
> > [Lap] *** MPI_ERR_REQUEST: invalid request
> > [Lap] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> > abort,
> > [Lap] ***and potentially your MPI job)
> > forrtl: error (78): process killed (SIGTERM)
> >
> > Someone could please explain to me where I am wrong?
> >
> > Thanks
> >
> > Diego
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2015/09/27703.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27704.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27706.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] send_request error with allocate

2015-09-29 Thread Diego Avesani
ok,
let me try

Diego


On 29 September 2015 at 16:23, Jeff Squyres (jsquyres) 
wrote:

> This code does not appear to compile -- there's no main program, for
> example.
>
> Can you make a small, self-contained example program that shows the
> problem?
>
>
> > On Sep 29, 2015, at 10:08 AM, Diego Avesani 
> wrote:
> >
> > Dear Jeff, Dear all,
> > the code is very long, here something. I hope that this could help.
> >
> > What do you think?
> >
> > SUBROUTINE MATOPQN
> > USE VARS_COMMON,ONLY:COMM_CART,send_messageR,recv_messageL,nMsg
> > USE MPI
> > INTEGER :: send_request(nMsg), recv_request(nMsg)
> > INTEGER ::
> send_status_list(MPI_STATUS_SIZE,nMsg),recv_status_list(MPI_STATUS_SIZE,nMsg)
> >
> >  !send message to right CPU
> > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN
> > MsgLength = MPIdata%jmaxN
> > DO icount=1,MPIdata%jmaxN
> > iNode = MPIdata%nodeList2right(icount)
> > send_messageR(icount) = RIS_2(iNode)
> > ENDDO
> >
> > CALL MPI_ISEND(send_messageR, MsgLength, MPI_DOUBLE_COMPLEX,
> MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD,
> send_request(MPIdata%rank+1), MPIdata%iErr)
> >
> > ENDIF
> > !
> >
> >
> > !recive message FROM left CPU
> > IF(MPIdata%rank.NE.0)THEN
> > MsgLength = MPIdata%jmaxN
> >
> > CALL MPI_IRECV(recv_messageL, MsgLength, MPI_DOUBLE_COMPLEX,
> MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, recv_request(MPIdata%rank),
> MPIdata%iErr)
> >
> > write(*,*) MPIdata%rank-1
> > ENDIF
> > !
> > !
> > CALL MPI_WAITALL(nMsg,send_request,send_status_list,MPIdata%iErr)
> > CALL MPI_WAITALL(nMsg,recv_request,recv_status_list,MPIdata%iErr)
> >
> > Diego
> >
> >
> > On 29 September 2015 at 00:15, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > Can you send a small reproducer program?
> >
> > > On Sep 28, 2015, at 4:45 PM, Diego Avesani 
> wrote:
> > >
> > > Dear all,
> > >
> > > I have to use a send_request in a MPI_WAITALL.
> > > Here the strange things:
> > >
> > > If I use at the begging of the SUBROUTINE:
> > >
> > > INTEGER :: send_request(3), recv_request(3)
> > >
> > > I have no problem, but if I use
> > >
> > > USE COMONVARS,ONLY : nMsg
> > > with nMsg=3
> > >
> > > and after that I declare
> > >
> > > INTEGER :: send_request(nMsg), recv_request(nMsg), I get the following
> error:
> > >
> > > [Lap] *** An error occurred in MPI_Waitall
> > > [Lap] *** reported by process [139726485585921,0]
> > > [Lap] *** on communicator MPI_COMM_WORLD
> > > [Lap] *** MPI_ERR_REQUEST: invalid request
> > > [Lap] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
> now abort,
> > > [Lap] ***and potentially your MPI job)
> > > forrtl: error (78): process killed (SIGTERM)
> > >
> > > Someone could please explain to me where I am wrong?
> > >
> > > Thanks
> > >
> > > Diego
> > >
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27703.php
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27704.php
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27706.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27708.php
>


Re: [OMPI users] send_request error with allocate

2015-09-29 Thread Gilles Gouaillardet
Diego,

if you invoke MPI_Waitall on three requests, and some of them have not been
initialized
(manually, or via MPI_Isend or MPI_Irecv), then the behavior of your
program is undetermined.

if you want to use array of requests (because it make the program simple)
but you know not all of them are actually used, then you have to initialize
them with MPI_REQUEST_NULL
(it might be zero on ompi, but you cannot take this for granted)

Cheers,

Gilles

On Tuesday, September 29, 2015, Diego Avesani 
wrote:

> dear Jeff, dear all,
> I have notice that if I initialize the variables, I do not have the error
> anymore:
> !
>   ALLOCATE(SEND_REQUEST(nMsg),RECV_REQUEST(nMsg))
>   SEND_REQUEST=0
>   RECV_REQUEST=0
> !
>
> Could you please explain me why?
> Thanks
>
>
> Diego
>
>
> On 29 September 2015 at 16:08, Diego Avesani  > wrote:
>
>> Dear Jeff, Dear all,
>> the code is very long, here something. I hope that this could help.
>>
>> What do you think?
>>
>> SUBROUTINE MATOPQN
>> USE VARS_COMMON,ONLY:COMM_CART,send_messageR,recv_messageL,nMsg
>> USE MPI
>> INTEGER :: send_request(nMsg), recv_request(nMsg)
>> INTEGER ::
>> send_status_list(MPI_STATUS_SIZE,nMsg),recv_status_list(MPI_STATUS_SIZE,nMsg)
>>
>>  !send message to right CPU
>> IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN
>> MsgLength = MPIdata%jmaxN
>> DO icount=1,MPIdata%jmaxN
>> iNode = MPIdata%nodeList2right(icount)
>> send_messageR(icount) = RIS_2(iNode)
>> ENDDO
>>
>> CALL MPI_ISEND(send_messageR, MsgLength, MPI_DOUBLE_COMPLEX,
>> MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD,
>> send_request(MPIdata%rank+1), MPIdata%iErr)
>>
>> ENDIF
>> !
>>
>>
>> !recive message FROM left CPU
>> IF(MPIdata%rank.NE.0)THEN
>> MsgLength = MPIdata%jmaxN
>>
>> CALL MPI_IRECV(recv_messageL, MsgLength, MPI_DOUBLE_COMPLEX,
>> MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, recv_request(MPIdata%rank),
>> MPIdata%iErr)
>>
>> write(*,*) MPIdata%rank-1
>> ENDIF
>> !
>> !
>> CALL MPI_WAITALL(nMsg,send_request,send_status_list,MPIdata%iErr)
>> CALL MPI_WAITALL(nMsg,recv_request,recv_status_list,MPIdata%iErr)
>>
>> Diego
>>
>>
>> On 29 September 2015 at 00:15, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com >
>> wrote:
>>
>>> Can you send a small reproducer program?
>>>
>>> > On Sep 28, 2015, at 4:45 PM, Diego Avesani >> > wrote:
>>> >
>>> > Dear all,
>>> >
>>> > I have to use a send_request in a MPI_WAITALL.
>>> > Here the strange things:
>>> >
>>> > If I use at the begging of the SUBROUTINE:
>>> >
>>> > INTEGER :: send_request(3), recv_request(3)
>>> >
>>> > I have no problem, but if I use
>>> >
>>> > USE COMONVARS,ONLY : nMsg
>>> > with nMsg=3
>>> >
>>> > and after that I declare
>>> >
>>> > INTEGER :: send_request(nMsg), recv_request(nMsg), I get the following
>>> error:
>>> >
>>> > [Lap] *** An error occurred in MPI_Waitall
>>> > [Lap] *** reported by process [139726485585921,0]
>>> > [Lap] *** on communicator MPI_COMM_WORLD
>>> > [Lap] *** MPI_ERR_REQUEST: invalid request
>>> > [Lap] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
>>> now abort,
>>> > [Lap] ***and potentially your MPI job)
>>> > forrtl: error (78): process killed (SIGTERM)
>>> >
>>> > Someone could please explain to me where I am wrong?
>>> >
>>> > Thanks
>>> >
>>> > Diego
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> 
>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> > Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/09/27703.php
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com 
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org 
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/09/27704.php
>>>
>>
>>
>


Re: [OMPI users] Using POSIX shared memory as send buffer

2015-09-29 Thread Nathan Hjelm

We register the memory with the NIC for both read and write access. This
may be the source of the slowdown. We recently added internal support to
allow the point-to-point layer to specify the access flags but the
openib btl does not yet make use of the new support. I plan to make the
necessary changes before the 2.0.0 release. I should have them complete
later this week. I can send you a note when they are ready if you would
like to try it and see if it addresses the problem.

-Nathan

On Tue, Sep 29, 2015 at 10:51:38AM +0200, Marcin Krotkiewski wrote:
> Thanks, Dave.
> 
> I have verified the memory locality and IB card locality, all's fine.
> 
> Quite accidentally I have found that there is a huge penalty if I mmap the
> shm with PROT_READ only. Using PROT_READ | PROT_WRITE yields good results,
> although I must look at this further. I'll report when I am certain, in case
> sb finds this useful.
> 
> Is this an OS feature, or is OpenMPI somehow working differently? I don't
> suspect you guys write to the send buffer, right? Even if you would there
> would be a segfault. So I guess this could be OS preventing any writes to
> the pointer that introduced the overhead?
> 
> Marcin
> 
> 
> 
> On 09/28/2015 09:44 PM, Dave Goodell (dgoodell) wrote:
> >On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski 
> > wrote:
> >>Hello, everyone
> >>
> >>I am struggling a bit with IB performance when sending data from a POSIX 
> >>shared memory region (/dev/shm). The memory is shared among many MPI 
> >>processes within the same compute node. Essentially, I see a bit hectic 
> >>performance, but it seems that my code it is roughly twice slower than when 
> >>using a usual, malloced send buffer.
> >It may have to do with NUMA effects and the way you're allocating/touching 
> >your shared memory vs. your private (malloced) memory.  If you have a 
> >multi-NUMA-domain system (i.e., any 2+ socket server, and even some 
> >single-socket servers) then you are likely to run into this sort of issue.  
> >The PCI bus on which your IB HCA communicates is almost certainly closer to 
> >one NUMA domain than the others, and performance will usually be worse if 
> >you are sending/receiving from/to a "remote" NUMA domain.
> >
> >"lstopo" and other tools can sometimes help you get a handle on the 
> >situation, though I don't know if it knows how to show memory affinity.  I 
> >think you can find memory affinity for a process via 
> >"/proc//numa_maps".  There's lots of info about NUMA affinity here: 
> >https://queue.acm.org/detail.cfm?id=2513149
> >
> >-Dave
> >
> >___
> >users mailing list
> >us...@open-mpi.org
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >Link to this post: 
> >http://www.open-mpi.org/community/lists/users/2015/09/27702.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27705.php


pgpKOrC5an6D_.pgp
Description: PGP signature


[OMPI users] worse latency in 1.8 c.f. 1.6

2015-09-29 Thread Dave Love
I've just compared IB p2p latency between version 1.6.5 and 1.8.8.  I'm
surprised to find that 1.8 is rather worse, as below.  Assuming that's
not expected, are there any suggestions for debugging it?

This is with FDR Mellanox, between two Sandybridge nodes on the same
blade chassis switch.  The results are similar for IMB pingpong and
osu_latency, and reproducible.  I'm running both cases the same way as
far as I can tell (e.g. core binding with 1.6 and not turning it off
with 1.8) just rebuilding the test against between OMPI versions.

The initial osu_latency figures for 1.6 are:

  # OSU MPI Latency Test v5.0
  # Size  Latency (us)
  0   1.16
  1   1.24
  2   1.23
  4   1.23
  8   1.26
  16  1.27
  32  1.30
  64  1.36

and for 1.8:

  # OSU MPI Latency Test v5.0
  # Size  Latency (us)
  0   1.48
  1   1.46
  2   1.42
  4   1.43
  8   1.46
  16  1.47
  32  1.48
  64  1.54



Re: [OMPI users] C/R Enabled Debugging

2015-09-29 Thread Dave Love
[Meanwhile, much later, as I thought I'd sent this...]

Ralph Castain  writes:

> Hi Zhang
>
> We have seen little interest in binary level CR over the years, which
> is the primary reason the support has lapsed.

That might be a bit chicken and egg!

> The approach just doesn’t scale very well.

Presumably that depends, and it definitely seems reasonable at our
scale.  (mvapich seems to take it seriously.)

> Once the graduate student who wrote it
> received his degree, there simply wasn’t enough user-level interest to
> motivate the developer members to maintain it.
>
> In the interim, we’ve seen considerable interest in application-level
> CR in its place. You might checkout the SCR library from LLNL as an
> example of what people are doing in that space:

Does it support ORTE?  When I last looked, it said only SLURM, but maybe
that doesn't include mvapich with other starters.  Also it assumes local
storage (or the associated in-memory filesystem), in case that's an
issue.

Is SCR not actually used for system-level checkpoints in mvapich?  I
assumed it was from what I'd read.

> https://computation.llnl.gov/project/scr/
> 
>
> We did have someone (another graduate student) recently work with the
> community to attempt to restore the binary-level CR support, but he
> didn’t get a chance to complete it prior to graduating. So we are
> removing the leftover code from the 2.x release series until someone
> comes along with enough interest to repair it.

How much knowledge and effort would that take?  Presumably knowing what
broke it would give some indication.

> Assuming that hasn’t happened before sometime next year, I might take
> a shot at it then - but I won’t have any time to work on it before
> next spring at the earliest, and as I said, it isn’t clear there is a
> significant user base for binary-level CR with the shift to
> application-level systems.

I'm sure it varies, but I don't see much useful checkpointing support,
and/or users willing to use it, here.

[Quite often it would be more useful to migrate part of a job, rather
than restart the whole thing, though that obviously requires support
from the resource manager.]



Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-29 Thread Mike Dubman
unfortunately, there is no one size fits all here.

mxm provides best performance for IB.

different application may require different OMPI, mxm, OS tunables and
requires some performance engineering.

On Mon, Sep 28, 2015 at 9:49 PM, Grigory Shamov  wrote:

> Hi Nathan,
> Hi Mike,
>
> Thanks for the quick replies!
>
> My problem is I don't know what are my applications. I mean, I know them,
> but we are a general purpose cluster, running in production for quite a
> while, and there are everybody, from quantum chemists to machine learners
> to bioinformatists. SO a system-wide change might harm some of them; and
> doing per-app benchmarking/tuning  looks a bit daunting.
>
> The default behaviour our users are used to was to have unlimited values
> for all memory limits. We have set it so a few years ago, as a response
> for some user complaints that applications won't start (we set the ulimits
> in Torque).
>
> Is it known (I know every application is different ) how much costs,
> performance-wise, to have MXM with good ulimits vs unlimited ulimits, vs
> not using MXM at all?
>
> --
> Grigory Shamov
>
> Westgrid/ComputeCanada Site Lead
> University of Manitoba
> E2-588 EITC Building,
> (204) 474-9625
>
>
>
>
>
>
> On 15-09-28 12:58 PM, "users on behalf of Nathan Hjelm"
>  wrote:
>
> >
> >I would like to add that you may want to play with the value and see
> >what works for your applications. Most applications should be using
> >malloc or similar functions to allocate large memory regions in the heap
> >and not on the stack.
> >
> >-Nathan
> >
> >On Mon, Sep 28, 2015 at 08:01:09PM +0300, Mike Dubman wrote:
> >>Hello Grigory,
> >>We observed ~10% performance degradation with heap size set to
> >>unlimited
> >>for CFD applications.
> >>You can measure your application performance with default and
> >>unlimited
> >>"limits" and select the best setting.
> >>Kind Regards.
> >>M
> >>On Mon, Sep 28, 2015 at 7:36 PM, Grigory Shamov
> >> wrote:
> >>
> >>  Hi All,
> >>
> >>  We have built OpenMPI (1.8.8., 1.10.0) against Mellanox OFED 2.4
> >>and
> >>  corresponding MXM. When it runs now, it gives the following
> >>warning, per
> >>  process:
> >>
> >>  [1443457390.911053] [myhist:5891 :0] mxm.c:185  MXM  WARN
> >>The
> >>  'ulimit -s' on the system is set to 'unlimited'. This may have
> >>negative
> >>  performance implications. Please set the heap size to the default
> >>value
> >>  (10240)
> >>
> >>  We have ulimits for heap (as well as most of the other limits) set
> >>  unlimited because of applications that might possibly need a lot
> >>of RAM.
> >>
> >>  The question is if we should do as MXM wants, or ignore it? Has
> >>anyone
> >>  an
> >>  experience running recent OpenMPI with MXM enabled, and what kind
> >>of
> >>  ulimits do you have? Any suggestions/comments appreciated, thanks!
> >>
> >>  --
> >>  Grigory Shamov
> >>
> >>  Westgrid/ComputeCanada Site Lead
> >>  University of Manitoba
> >>  E2-588 EITC Building,
> >>  (204) 474-9625
> >>
> >>  ___
> >>  users mailing list
> >>  us...@open-mpi.org
> >>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>  Link to this post:
> >>  http://www.open-mpi.org/community/lists/users/2015/09/27697.php
> >>
> >>--
> >>Kind Regards,
> >>M.
> >
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >>http://www.open-mpi.org/community/lists/users/2015/09/27698.php
> >
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27701.php
>



-- 

Kind Regards,

M.


Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-09-29 Thread Mike Dubman
what is your command line and setup? (ofed version, distro)

This is what was just measured w/ fdr on haswell with v1.8.8 and mxm and UD

+ mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy
dist:span -x MXM_RDMA_PORTS=mlx5_3:1 -mca rmaps_dist_device mlx5_3:1  -x
MXM_TLS=self,shm,ud osu_latency
 Data for JOB [65499,1] offset 0

    JOB MAP   

 Data for node: clx-orion-001   Num slots: 28   Max slots: 0Num procs: 1
Process OMPI jobid: [65499,1] App: 0 Process rank: 0

 Data for node: clx-orion-002   Num slots: 28   Max slots: 0Num procs: 1
Process OMPI jobid: [65499,1] App: 0 Process rank: 1

 =
# OSU MPI Latency Test v4.4.1
# Size  Latency (us)
0   1.18
1   1.16
2   1.19
4   1.20
8   1.19
16  1.19
32  1.21
64  1.27


and w/ ob1, openib btl:

mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy
dist:span  -mca rmaps_dist_device mlx5_3:1  -mca btl_if_include mlx5_3:1
-mca pml ob1 -mca btl openib,self osu_latency

# OSU MPI Latency Test v4.4.1
# Size  Latency (us)
0   1.13
1   1.17
2   1.17
4   1.17
8   1.22
16  1.23
32  1.25
64  1.28


On Tue, Sep 29, 2015 at 6:49 PM, Dave Love  wrote:

> I've just compared IB p2p latency between version 1.6.5 and 1.8.8.  I'm
> surprised to find that 1.8 is rather worse, as below.  Assuming that's
> not expected, are there any suggestions for debugging it?
>
> This is with FDR Mellanox, between two Sandybridge nodes on the same
> blade chassis switch.  The results are similar for IMB pingpong and
> osu_latency, and reproducible.  I'm running both cases the same way as
> far as I can tell (e.g. core binding with 1.6 and not turning it off
> with 1.8) just rebuilding the test against between OMPI versions.
>
> The initial osu_latency figures for 1.6 are:
>
>   # OSU MPI Latency Test v5.0
>   # Size  Latency (us)
>   0   1.16
>   1   1.24
>   2   1.23
>   4   1.23
>   8   1.26
>   16  1.27
>   32  1.30
>   64  1.36
>
> and for 1.8:
>
>   # OSU MPI Latency Test v5.0
>   # Size  Latency (us)
>   0   1.48
>   1   1.46
>   2   1.42
>   4   1.43
>   8   1.46
>   16  1.47
>   32  1.48
>   64  1.54
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27712.php
>



-- 

Kind Regards,

M.


Re: [OMPI users] Using POSIX shared memory as send buffer

2015-09-29 Thread marcin.krotkiewski


I've now run a few more tests and I think I can reasonably confidently 
say that the read only mmap is a problem. Let me know if you have a 
possible fix - I will gladly test it.


Marcin


On 09/29/2015 04:59 PM, Nathan Hjelm wrote:

We register the memory with the NIC for both read and write access. This
may be the source of the slowdown. We recently added internal support to
allow the point-to-point layer to specify the access flags but the
openib btl does not yet make use of the new support. I plan to make the
necessary changes before the 2.0.0 release. I should have them complete
later this week. I can send you a note when they are ready if you would
like to try it and see if it addresses the problem.

-Nathan

On Tue, Sep 29, 2015 at 10:51:38AM +0200, Marcin Krotkiewski wrote:

Thanks, Dave.

I have verified the memory locality and IB card locality, all's fine.

Quite accidentally I have found that there is a huge penalty if I mmap the
shm with PROT_READ only. Using PROT_READ | PROT_WRITE yields good results,
although I must look at this further. I'll report when I am certain, in case
sb finds this useful.

Is this an OS feature, or is OpenMPI somehow working differently? I don't
suspect you guys write to the send buffer, right? Even if you would there
would be a segfault. So I guess this could be OS preventing any writes to
the pointer that introduced the overhead?

Marcin



On 09/28/2015 09:44 PM, Dave Goodell (dgoodell) wrote:

On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski  
wrote:

Hello, everyone

I am struggling a bit with IB performance when sending data from a POSIX shared 
memory region (/dev/shm). The memory is shared among many MPI processes within 
the same compute node. Essentially, I see a bit hectic performance, but it 
seems that my code it is roughly twice slower than when using a usual, malloced 
send buffer.

It may have to do with NUMA effects and the way you're allocating/touching your shared 
memory vs. your private (malloced) memory.  If you have a multi-NUMA-domain system (i.e., 
any 2+ socket server, and even some single-socket servers) then you are likely to run 
into this sort of issue.  The PCI bus on which your IB HCA communicates is almost 
certainly closer to one NUMA domain than the others, and performance will usually be 
worse if you are sending/receiving from/to a "remote" NUMA domain.

"lstopo" and other tools can sometimes help you get a handle on the situation, though I don't 
know if it knows how to show memory affinity.  I think you can find memory affinity for a process via 
"/proc//numa_maps".  There's lots of info about NUMA affinity here: 
https://queue.acm.org/detail.cfm?id=2513149

-Dave

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27702.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27705.php


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27711.php




Re: [OMPI users] Using POSIX shared memory as send buffer

2015-09-29 Thread Nathan Hjelm

I have a branch with the changes available at:

https://github.com/hjelmn/ompi.git

in the mpool_update branch. If you prefer you can apply this patch to
either a 2.x or a master tarball.

https://github.com/hjelmn/ompi/commit/8839dbfae85ba8f443b2857f9bbefdc36c4ebc1a.patch

Let me know if this resolves the performance issues.

-Nathan

On Tue, Sep 29, 2015 at 09:57:54PM +0200, marcin.krotkiewski wrote:
>I've now run a few more tests and I think I can reasonably confidently say
>that the read only mmap is a problem. Let me know if you have a possible
>fix - I will gladly test it.
> 
>Marcin
> 
>On 09/29/2015 04:59 PM, Nathan Hjelm wrote:
> 
>  We register the memory with the NIC for both read and write access. This
>  may be the source of the slowdown. We recently added internal support to
>  allow the point-to-point layer to specify the access flags but the
>  openib btl does not yet make use of the new support. I plan to make the
>  necessary changes before the 2.0.0 release. I should have them complete
>  later this week. I can send you a note when they are ready if you would
>  like to try it and see if it addresses the problem.
> 
>  -Nathan
> 
>  On Tue, Sep 29, 2015 at 10:51:38AM +0200, Marcin Krotkiewski wrote:
> 
>  Thanks, Dave.
> 
>  I have verified the memory locality and IB card locality, all's fine.
> 
>  Quite accidentally I have found that there is a huge penalty if I mmap the
>  shm with PROT_READ only. Using PROT_READ | PROT_WRITE yields good results,
>  although I must look at this further. I'll report when I am certain, in case
>  sb finds this useful.
> 
>  Is this an OS feature, or is OpenMPI somehow working differently? I don't
>  suspect you guys write to the send buffer, right? Even if you would there
>  would be a segfault. So I guess this could be OS preventing any writes to
>  the pointer that introduced the overhead?
> 
>  Marcin
> 
> 
> 
>  On 09/28/2015 09:44 PM, Dave Goodell (dgoodell) wrote:
> 
>  On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski 
>  wrote:
> 
>  Hello, everyone
> 
>  I am struggling a bit with IB performance when sending data from a POSIX 
> shared memory region (/dev/shm). The memory is shared among many MPI 
> processes within the same compute node. Essentially, I see a bit hectic 
> performance, but it seems that my code it is roughly twice slower than when 
> using a usual, malloced send buffer.
> 
>  It may have to do with NUMA effects and the way you're allocating/touching 
> your shared memory vs. your private (malloced) memory.  If you have a 
> multi-NUMA-domain system (i.e., any 2+ socket server, and even some 
> single-socket servers) then you are likely to run into this sort of issue.  
> The PCI bus on which your IB HCA communicates is almost certainly closer to 
> one NUMA domain than the others, and performance will usually be worse if you 
> are sending/receiving from/to a "remote" NUMA domain.
> 
>  "lstopo" and other tools can sometimes help you get a handle on the 
> situation, though I don't know if it knows how to show memory affinity.  I 
> think you can find memory affinity for a process via "/proc//numa_maps". 
>  There's lots of info about NUMA affinity here: 
> https://queue.acm.org/detail.cfm?id=2513149
> 
>  -Dave
> 
>  ___
>  users mailing list
>  us...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>  Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27702.php
> 
>  ___
>  users mailing list
>  us...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>  Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27705.php
> 
>  ___
>  users mailing list
>  us...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>  Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27711.php

> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27716.php



pgpPYlQcWyJlJ.pgp
Description: PGP signature


Re: [OMPI users] Using POSIX shared memory as send buffer

2015-09-29 Thread Nathan Hjelm

There was a bug in that patch that affected IB systems. Updated patch:

https://github.com/hjelmn/ompi/commit/c53df23c0bcf8d1c531e04d22b96c8c19f9b3fd1.patch

-Nathan

On Tue, Sep 29, 2015 at 03:35:21PM -0600, Nathan Hjelm wrote:
> 
> I have a branch with the changes available at:
> 
> https://github.com/hjelmn/ompi.git
> 
> in the mpool_update branch. If you prefer you can apply this patch to
> either a 2.x or a master tarball.
> 
> https://github.com/hjelmn/ompi/commit/8839dbfae85ba8f443b2857f9bbefdc36c4ebc1a.patch
> 
> Let me know if this resolves the performance issues.
> 
> -Nathan
> 
> On Tue, Sep 29, 2015 at 09:57:54PM +0200, marcin.krotkiewski wrote:
> >I've now run a few more tests and I think I can reasonably confidently 
> > say
> >that the read only mmap is a problem. Let me know if you have a possible
> >fix - I will gladly test it.
> > 
> >Marcin
> > 
> >On 09/29/2015 04:59 PM, Nathan Hjelm wrote:
> > 
> >  We register the memory with the NIC for both read and write access. This
> >  may be the source of the slowdown. We recently added internal support to
> >  allow the point-to-point layer to specify the access flags but the
> >  openib btl does not yet make use of the new support. I plan to make the
> >  necessary changes before the 2.0.0 release. I should have them complete
> >  later this week. I can send you a note when they are ready if you would
> >  like to try it and see if it addresses the problem.
> > 
> >  -Nathan
> > 
> >  On Tue, Sep 29, 2015 at 10:51:38AM +0200, Marcin Krotkiewski wrote:
> > 
> >  Thanks, Dave.
> > 
> >  I have verified the memory locality and IB card locality, all's fine.
> > 
> >  Quite accidentally I have found that there is a huge penalty if I mmap the
> >  shm with PROT_READ only. Using PROT_READ | PROT_WRITE yields good results,
> >  although I must look at this further. I'll report when I am certain, in 
> > case
> >  sb finds this useful.
> > 
> >  Is this an OS feature, or is OpenMPI somehow working differently? I don't
> >  suspect you guys write to the send buffer, right? Even if you would there
> >  would be a segfault. So I guess this could be OS preventing any writes to
> >  the pointer that introduced the overhead?
> > 
> >  Marcin
> > 
> > 
> > 
> >  On 09/28/2015 09:44 PM, Dave Goodell (dgoodell) wrote:
> > 
> >  On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski 
> >  wrote:
> > 
> >  Hello, everyone
> > 
> >  I am struggling a bit with IB performance when sending data from a POSIX 
> > shared memory region (/dev/shm). The memory is shared among many MPI 
> > processes within the same compute node. Essentially, I see a bit hectic 
> > performance, but it seems that my code it is roughly twice slower than when 
> > using a usual, malloced send buffer.
> > 
> >  It may have to do with NUMA effects and the way you're allocating/touching 
> > your shared memory vs. your private (malloced) memory.  If you have a 
> > multi-NUMA-domain system (i.e., any 2+ socket server, and even some 
> > single-socket servers) then you are likely to run into this sort of issue.  
> > The PCI bus on which your IB HCA communicates is almost certainly closer to 
> > one NUMA domain than the others, and performance will usually be worse if 
> > you are sending/receiving from/to a "remote" NUMA domain.
> > 
> >  "lstopo" and other tools can sometimes help you get a handle on the 
> > situation, though I don't know if it knows how to show memory affinity.  I 
> > think you can find memory affinity for a process via 
> > "/proc//numa_maps".  There's lots of info about NUMA affinity here: 
> > https://queue.acm.org/detail.cfm?id=2513149
> > 
> >  -Dave
> > 
> >  ___
> >  users mailing list
> >  us...@open-mpi.org
> >  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >  Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2015/09/27702.php
> > 
> >  ___
> >  users mailing list
> >  us...@open-mpi.org
> >  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >  Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2015/09/27705.php
> > 
> >  ___
> >  users mailing list
> >  us...@open-mpi.org
> >  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >  Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2015/09/27711.php
> 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2015/09/27716.php
> 



> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27717.php



pgp