Re: [OMPI users] multiple LIDs

Jeff Squyres Mon, 4 Dec 2006 14:46:51 -0500

There are two distinct layers of software being discussed here:


- the PML (basically the back-end to MPI_SEND and friends)

- the BTL (byte transfer layer, the back-end bit movers for the ob1and dr PMLs -- this distinction is important because there is nothingin the PML design that forces the use of BTL's; indeed, there is atleast one current PML that does not use BTL's as the back-end bitmover [the cm PML])

The ob1 and dr PMLs know nothing about how the back-end bitmoverswork (BTL components) -- the BTLs are given considerable freedom tooperate within their specific interface contracts.

Generally, ob1/dr queries each BTL component when Open MPI startsup. Each BTL responds with whether it wants to run or not. If itdoes, it gives back the one or more modules (think of a module as an"instance" of a component). Typically, these modules correspond tomultiple NICs / HCAs / network endpoints. For example, if you have 2ethernet cards, the tcp BTL will create and return 2 modules. ob1 /dr will treat these as two paths to send data (reachability iscomputed as well, of course -- ob1/dr will only send data down btlsfor which the target peer is reachable). In general, ob1/dr willround-robin across all available BTL modules when sending largemessages (as Gleb has described). See http://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols/ for a general description of the ob1/dr protocols.

The openib BTL can return multiple modules if multiple LIDs areavailable. So the ob1/dr doesn't know that these are not physicaldevices -- it just treats each module as an equivalent mechanism tosend data.

This is actually somewhat lame as a scheme, and we talked internallyabout doing something more intelligent. But we decided to hold offand let people (like you!) with real-world apps and networks givethis stuff a try and see what really works (and what doesn't work)before trying to implement anything else.

So -- all that explanation aside -- we'd love to hear your feedbackwith regards to the multi-LID stuff in Open MPI. :-)




On Dec 4, 2006, at 1:27 PM, Chevchenkovic Chevchenkovic wrote:

 Thanks for that.

 Suppose,  if there there are multiple interconnects, say ethernet and
infiniband  and a million byte of data is to be sent, then in this
case the data will be sent through infiniband (since its a fast path
.. please correct me here if i m wrong).

  If there are mulitple such sends, do you mean to say that each send
will go  through  different BTLs in a RR manner if they are connected
to the same port?

 -chev


On 12/4/06, Gleb Natapov <gl...@voltaire.com> wrote:

On Mon, Dec 04, 2006 at 10:53:26PM +0530, ChevchenkovicChevchenkovic wrote:
Hi,
 It is not clear from the code as mentioned by you from
ompi/mca/pml/ob1/  where exactly the selection of BTL bound to a
particular LID occurs. Could you please specify the file/functionname
for the same?
There is no such code there. OB1 knows nothing about LIDs. It does RR
over all available interconnects. It can do RR between ethernet, IB
and Myrinet for instance. BTL presents each LID as differentvirtual HCAto OB1 and it does round-robin between them without event knowingthis
is the same port of the same HCA.

Can you explain what are you trying to achieve?
 -chev


On 12/4/06, Gleb Natapov <gl...@voltaire.com> wrote:
On Mon, Dec 04, 2006 at 01:07:08AM +0530, ChevchenkovicChevchenkovic wrote:
Also could you please tell me which part of the openMPI codeneeds tobe touched so that I can do some modifications in it toincorporate
changes regarding LID selection...
It depend what do you want to do. The part that does RR over all
available LIDs is in OB1 PML (ompi/mca/pml/ob1/), but the codedoesn't
aware of the fact that it is doing RR over different LIDs and not
different NICs (yet?).

The code that controls what LIDs will be used is in
ompi/mca/btl/openib/btl_openib_component.c.
On 12/4/06, Chevchenkovic Chevchenkovic<chevchenko...@gmail.com> wrote:
Is it possible to control the LID where the send and recvs are
posted.. on either ends?

On 12/3/06, Gleb Natapov <gl...@voltaire.com> wrote:
On Sun, Dec 03, 2006 at 07:03:33PM +0530, ChevchenkovicChevchenkovic
wrote:
Hi,
 I had this query. I hope some expert replies to it.
I have 2 nodes connected point-to-point using infinibandcable. There
are multiple LIDs for each of the end node ports.
When I give an MPI_Send, Are the sends are posted ondifferent LIDson each of the end nodes OR they are they posted on the sameLID?
 Awaiting your reply,
It depend what version of Open MPI your are using. If you areusingtrunk or v1.2 beta then all available LIDs are used in RRfashion. The
early
versions don't support LMC.

--
                  Gleb.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
                       Gleb.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
                       Gleb.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI users] multiple LIDs

Reply via email to