Thanks Ralph and Jeff, I understand.

On 2013年02月05日 03:34, Jeff Squyres (jsquyres) wrote:
To be clear: this is a common misconception.

Open MPI does not determine which network to use for MPI communication by the 
hostname(s) you use to launch your application.  Specifically: the hostnames 
that you list in the hostfile, command line, or whatever your resource manager 
provides are *not* used to determine which networks to use for MPI 
communication.

Open MPI only uses hostnames to identify unique servers (so that we can launch 
processeson them).  We use different controls -- outlined by Ralph -- to 
determine which network(s) to use for MPI communication.

Hope that helps.


On Feb 2, 2013, at 6:43 AM, Ralph Castain <r...@open-mpi.org> wrote:

I'm afraid this doesn't make much sense to me. LSF has dispatched node1 and 
node2 - correct? It sounds like you have also given those names aliases that 
refer to their IB ports - generally a very bad practice, but let's set that 
aside for now.

If they are the same physical nodes, then the node name makes no difference - 
OMPI will see both TCP and IB on the node and use them. You can control which 
interfaces get used by simply telling OMPI on its command line:

mpirun -mca btl tcp,sm,self ...  will use shared memory and TCP

mpirun -mca openib,sm,self ...  will use IB and shared memory

Using host names to try and control which network gets used isn't going to work 
- the software is too smart to be fooled that way.


On Feb 2, 2013, at 6:33 AM, HM Li <li...@163.com> wrote:

Can you help me?

The bnode1.bnode2 and node1,node2 are the hostnames of the same nodes 
corresponding to the InfiniBand and ethernet network respectively.
The node1,node2 are the nodes declarated in lsf.cluster.name
In order to use the IB network, I have modified the lsf mpijob script, and 
modified the HOSTFILE containing the nodes which LSF dispatched from node to 
bnode.
Then use "mpiexec -machinefile $HOSTFILE $COMMANDLINE" to run my jobs.
But the job exits and shows:
-------------------------------------------------------------
A hostfile was provided that contains at least one node not
present in the allocation:

   hostfile:  /home/nic/hmli/.lsbatch/bhost23263.node1
   node:      bnode2

If you are operating in a resource-managed environment, then only
nodes that are in the allocation can be used in the hostfile. You
may find relative node syntax to be a useful alternative to
specifying absolute node names see the orte_hosts man page for
further information.
-------------------------------------------------------------

I don't want to change the hostname from node to bnode in lsf.cluster.name.

Thank you very much.


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to