I honestly have no idea…
> On May 16, 2016, at 10:39 AM, Zabiziz Zaz <zabi...@gmail.com> wrote:
>
> Ok.
> Could you please tell me the latest version that is supported?
>
> Regards,
> Guilherme.
>
> On Mon, May 16, 2016 at 12:30 PM, Ralph Castain <r...@open-mpi.org
> <mailto:r...@open-mpi.org>> wrote:
> We used to do so, but don’t currently support that model - folks are working
> on restoring it. No timetable, though I don’t think it will be too much
> longer before it is in master. Can’t say when it will hit release
>
>> On May 16, 2016, at 8:25 AM, Zabiziz Zaz <zabi...@gmail.com
>> <mailto:zabi...@gmail.com>> wrote:
>>
>> Hi Llolsten,
>> the problem is not a firewall issue. The simplest way to reproduce the
>> problem is rebooting a node in the middle of the job. It's possible to
>> configure the openmpi to not terminate the job if, in the middle of the job,
>> one node is rebooted?
>>
>> Thanks again for your help.
>>
>> Regards,
>> Guilherme
>>
>> On Mon, May 16, 2016 at 12:11 PM, Llolsten Kaonga <l...@soft-forge.com
>> <mailto:l...@soft-forge.com>> wrote:
>> Hello Guilherme,
>>
>>
>>
>> This may be off but try running your mpirun command with the option
>> “–tag-output”. If you see a “broken pipe”, then your issue may be firewall
>> related. You could then check the thread “Re: [OMPI users] mpirun command
>> won't run unless the firewalld daemon is disabled” for how to get around
>> this from Gilles or Jeff.
>>
>>
>>
>> I thank you.
>>
>> --
>>
>> Llolsten
>>
>> <>
>> From: users [mailto:users-boun...@open-mpi.org
>> <mailto:users-boun...@open-mpi.org>] On Behalf Of Zabiziz Zaz
>> Sent: Monday, May 16, 2016 10:46 AM
>> To: us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subject: [OMPI users] ORTE has lost communication
>>
>>
>>
>> Hi,
>>
>> I'm using openmpi-1.10.2 and sometimes I'm receiving the message below:
>>
>> --------------------------------------------------------------------------
>>
>> ORTE has lost communication with its daemon located on node:
>>
>>
>>
>> hostname: xxxx
>>
>>
>>
>> This is usually due to either a failure of the TCP network
>>
>> connection to the node, or possibly an internal failure of
>>
>> the daemon itself. We cannot recover from this failure, and
>>
>> therefore will terminate the job.
>>
>>
>>
>> --------------------------------------------------------------------------
>>
>>
>>
>> My applications are fault tolerant and the jobs usually takes weeks to
>> finish. Sometimes a hardware problem occurs with one node, for example, the
>> node shutdown. I don't want mpi to terminate the job, my jobs usually have
>> hundreds of nodes and I don't care if 1 node lost communication.
>>
>>
>>
>> It's possible to change this behavior of openmpi? I tried to set
>> orte_abort_on_non_zero_status to 0 but it didn't work.
>>
>>
>>
>> Thanks for your help.
>>
>>
>>
>> Regards,
>>
>> Guilherme.
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/05/29214.php
>> <http://www.open-mpi.org/community/lists/users/2016/05/29214.php>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/05/29218.php
>> <http://www.open-mpi.org/community/lists/users/2016/05/29218.php>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29220.php
> <http://www.open-mpi.org/community/lists/users/2016/05/29220.php>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29225.php