[OMPI users] SGE integration broken in 2.0.0

2016-08-11 Thread Reuti
Hi, In the file orte/mca/plm/rsh/plm_rsh_component I see an if-statement, which seems to prevent the tight integration with SGE to start: if (NULL == mca_plm_rsh_component.agent) { Why is it there (it wasn't in 1.10.3)? If I just remove it I get: [node17:25001] [[27678,0],0] plm:rsh: fina

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-11 Thread Reuti
> Am 11.08.2016 um 13:28 schrieb Reuti : > > Hi, > > In the file orte/mca/plm/rsh/plm_rsh_component I see an if-statement, which > seems to prevent the tight integration with SGE to start: > >if (NULL == mca_plm_rsh_component.agent) { > > Why is it there (it wasn't in 1.10.3)? > > If I j

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

2016-08-11 Thread Xiaolong Cui
Thanks! I tried it, but it didn't solve my problem. Maybe the reason is not eager/rndv. The reason why I want to always use eager mode is that I want to avoid a sender being slowed down by an unready receiver. I can prevent a sender from slowing down by always using eager mode on InfiniBand, just

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

2016-08-11 Thread Xiaolong Cui
Sorry, forgot the attachments. On Thu, Aug 11, 2016 at 5:06 PM, Xiaolong Cui wrote: > Thanks! I tried it, but it didn't solve my problem. Maybe the reason is > not eager/rndv. > > The reason why I want to always use eager mode is that I want to avoid a > sender being slowed down by an unready re

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

2016-08-11 Thread Cabral, Matias A
Michael, In general terms and assuming you are running all messages sizes in PIO Eager Mode, the communication is going to be affected by the CPU load. In other words, the bigger the message, the more CPU cycles to copy the buffer. Additionally, I have to say I’m not very certain how MPI_Send()

Re: [OMPI users] EXTERNAL: Re: Question on run-time error "ORTE was unable to reliably start"

2016-08-11 Thread Blosch, Edwin L
I had another observation of the problem, with a little more insight. I can confirm that the job has been running several hours before dying with the 'ORTE was unable to reliably start' message. Somehow it is possible. I had used the following options to try and get some more diagnostics:

Re: [OMPI users] EXTERNAL: Re: Question on run-time error "ORTE was unable to reliably start"

2016-08-11 Thread Gilles Gouaillardet
Hi, this is very puzzling ... is your application using MPI_Comm_spawn and friends ? If not, is orted on node k2n01 *really* dead ? or does the head node incorrectly believes orted died ? you might want to add the following configuration in your ~/.ssh/config TCPKeepAlive=yes ServerAlive

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

2016-08-11 Thread Xiaolong Cui
What you said totally makes sense. I think I will start using MPI_Isend(). Thanks for your help very much! Michael On Thu, Aug 11, 2016 at 6:36 PM, Cabral, Matias A wrote: > Michael, > > > > In general terms and assuming you are running all messages sizes in PIO > Eager Mode, the communication