date:20151107

Re: [OMPI users] Failure detection

2015-11-07 Thread Cristian Camilo Ruiz Sanabria

Thanks for answering. I tested again, this time using a real cluster where I have the possibility of rebooting the machines at will. I run a test using 32 machines running a MPI process per machine and during the execution I rebooted one of the machines and I found the same behavior: OpenMPI de

Re: [OMPI users] Failure detection

2015-11-07 Thread Ralph Castain

No, that certainly isn’t the normal behavior. I suspect it has to do with the nature of the VM TCP connection, though there is something very strange about your output. The BTL message indicates that an MPI job is already running. Yet your subsequent ORTE error message indicates we are still try

[OMPI users] Failure detection

2015-11-07 Thread Cristian RUIZ

Hello, I was studying how OpenMPI reacts to failures. I have a virtual infrastructure where failures can be emulated by turning off a given VM. Depending on the way the VM is turned off the 'mpirun' will be notified, either because it receives a signal or because some timeout is reached. In bo

Re: [OMPI users] Failure detection

Re: [OMPI users] Failure detection

[OMPI users] Failure detection

3 matches

Site Navigation

Mail list logo

Footer information