Re: [OMPI users] Fault Tolerant with openib

2011-09-27 Thread Guilherme V
Ralph Castain wrote: > On Sep 23, 2011, at 1:21 PM, Guilherme V wrote: > > I'm using version 1.4.3 and I forgot to tell that I have made a change in > the orterun.c line 792: > > if (ORTE_JOB_STATE_TERMINATED != exit_state) { > exit(0); /* pat

Re: [OMPI users] Fault Tolerant with openib

2011-09-23 Thread Guilherme V
d you do to keep it running after node failure with tcp? >On Sep 23, 2011, at 12:34 PM, Guilherme V wrote: >> Hi, >> I want to know if anybody is having problems with fault tolerant job using infiniband. When I run my job with tcp if anything happens with one node, my job keeps runn

[OMPI users] Fault Tolerant with openib

2011-09-23 Thread Guilherme V
Hi, I want to know if anybody is having problems with fault tolerant job using infiniband. When I run my job with tcp if anything happens with one node, my job keeps running, but if I change my job to use infiniband if anything happens with the infiniband (i.e cable problems) my job fails. Anybody

[OMPI users] Fault Tolerant with openib

2011-09-23 Thread Guilherme V
Hi, I want to know if anybody is having problems with fault tolerant job using infiniband. When I run my job with tcp if anything happens with one node, my job keeps running, but if I change my job to use infiniband if anything happens with the infiniband (i.e cable problems) my job fails. Anybody