Ralph Castain wrote:
> On Sep 23, 2011, at 1:21 PM, Guilherme V wrote:
>
> I'm using version 1.4.3 and I forgot to tell that I have made a change in
> the orterun.c line 792:
>
> if (ORTE_JOB_STATE_TERMINATED != exit_state) {
> exit(0); /* pat
d you do to keep it running after node failure with tcp?
>On Sep 23, 2011, at 12:34 PM, Guilherme V wrote:
>> Hi,
>> I want to know if anybody is having problems with fault tolerant job
using infiniband. When I run my job with tcp if anything happens with one
node, my job keeps runn
Hi,
I want to know if anybody is having problems with fault tolerant job using
infiniband. When I run my job with tcp if anything happens with one node, my
job keeps running, but if I change my job to use infiniband if anything
happens with the infiniband (i.e cable problems) my job fails.
Anybody
Hi,
I want to know if anybody is having problems with fault tolerant job using
infiniband. When I run my job with tcp if anything happens with one node, my
job keeps running, but if I change my job to use infiniband if anything
happens with the infiniband (i.e cable problems) my job fails.
Anybody