Hi,
In the file orte/mca/plm/rsh/plm_rsh_component I see an if-statement, which
seems to prevent the tight integration with SGE to start:
if (NULL == mca_plm_rsh_component.agent) {
Why is it there (it wasn't in 1.10.3)?
If I just remove it I get:
[node17:25001] [[27678,0],0] plm:rsh: fina
> Am 11.08.2016 um 13:28 schrieb Reuti :
>
> Hi,
>
> In the file orte/mca/plm/rsh/plm_rsh_component I see an if-statement, which
> seems to prevent the tight integration with SGE to start:
>
>if (NULL == mca_plm_rsh_component.agent) {
>
> Why is it there (it wasn't in 1.10.3)?
>
> If I j
Thanks! I tried it, but it didn't solve my problem. Maybe the reason is not
eager/rndv.
The reason why I want to always use eager mode is that I want to avoid a
sender being slowed down by an unready receiver. I can prevent a sender
from slowing down by always using eager mode on InfiniBand, just
Sorry, forgot the attachments.
On Thu, Aug 11, 2016 at 5:06 PM, Xiaolong Cui wrote:
> Thanks! I tried it, but it didn't solve my problem. Maybe the reason is
> not eager/rndv.
>
> The reason why I want to always use eager mode is that I want to avoid a
> sender being slowed down by an unready re
Michael,
In general terms and assuming you are running all messages sizes in PIO Eager
Mode, the communication is going to be affected by the CPU load. In other
words, the bigger the message, the more CPU cycles to copy the buffer.
Additionally, I have to say I’m not very certain how MPI_Send()
I had another observation of the problem, with a little more insight. I can
confirm that the job has been running several hours before dying with the 'ORTE
was unable to reliably start' message. Somehow it is possible. I had used
the following options to try and get some more diagnostics:
Hi,
this is very puzzling ...
is your application using MPI_Comm_spawn and friends ?
If not, is orted on node k2n01 *really* dead ? or does the head node
incorrectly believes orted died ?
you might want to add the following configuration in your ~/.ssh/config
TCPKeepAlive=yes
ServerAlive
What you said totally makes sense. I think I will start using MPI_Isend().
Thanks for your help very much!
Michael
On Thu, Aug 11, 2016 at 6:36 PM, Cabral, Matias A wrote:
> Michael,
>
>
>
> In general terms and assuming you are running all messages sizes in PIO
> Eager Mode, the communication