Hi Ralph,

Thanks for the reply!
I have tried, but couldn't get 1.8.8 or 1.10 (tried 1.10.0 back then) to
work with our pretty old Torque 2.5.13 with cpusets . Under some
circumstances (process/node layout as given by Torque), it fails to bind
cores with messages like:

  Error message:     hwloc_set_cpubind returned "Error" for bitmap "0"
  Location:        
../../../../../openmpi-1.10.0/orte/mca/odls/default/odls_default_module.c:5
51



-- 
Grigory Shamov
HPC Analist,

Westgrid/ComputeCanada Site Lead
University of Manitoba
E2-588 EITC Building,
(204) 474-9625





On 15-11-26 6:42 PM, "users on behalf of Ralph Castain"
<users-boun...@open-mpi.org on behalf of r...@open-mpi.org> wrote:

>You might want to upgrade to 1.10.1, or at least to 1.8.8 as 1.6.5 is
>pretty old
>
>> On Nov 26, 2015, at 1:49 PM, Grigory Shamov
>><grigory.sha...@umanitoba.ca> wrote:
>> 
>> Hi All,
>> 
>> For a parallel MPI job, we sometimes (not always) get the following
>> message:
>> 
>> [n047:25850] [[36630,0],1] -> [[36630,0],0] (node: n230) oob-tcp: Number
>> of attempts to create TCP connection has been exceeded.  Can not
>> communicate with peer
>> [n047:25850] [[36630,0],1] ORTE_ERROR_LOG: Unreachable in file
>> ../../../../../openmpi-1.6.5/orte/mca/grpcomm/bad/grpcomm_bad_module.c
>>at
>> line 412
>> [n047:25850] [[36630,0],1] -> [[36630,0],0] (node: n230) oob-tcp: Number
>> of attempts to create TCP connection has been exceeded.  Can not
>> communicate with peer
>> 
>> These appear in the middle of a running job; we use OpenMPI 1.6.5 and
>>OFED
>> 2.4 on CentOS 6.
>> 
>> -- 
>> Grigory Shamov
>> HPC Analist,
>> Westgrid/ComputeCanada Site Lead
>> University of Manitoba
>> E2-588 EITC Building,
>> (204) 474-9625
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>>http://www.open-mpi.org/community/lists/users/2015/11/28113.php
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post:
>http://www.open-mpi.org/community/lists/users/2015/11/28114.php

Reply via email to