Here is a paper on PML OB1:
http://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols

There is also some information in this paper:
http://www.open-mpi.org/papers/ipdps-2006

For a very detailed presentation on OB1 go here:
http://www.open-mpi.org/papers/workshop-2006/wed_01_pt2pt.pdf


In general we have 3 higher level Point-to-point Messaging Layers (PMLs):

OB1 - Default high performance PML for networks that do not provide higher level MPI semantics (read: don't provide matching) DR - Network fault tolerant PML again for networks that do not provide higher level MPI semantics Both OB1 and DR use the BTL (Byte Transfer Layer) interface as described in the above papers
            Currently supported BTLs: GM, Mvapi, MX, OpenIB, SM, TCP, UDAPL

CM - High performance PML for networks that DO provide higher level MPI semantics
            CM uses the MTL (Matching Transfer Layer) interface
            Currently supported MTLs: MX, PSM (InfiniPath), Portals


Note that MX is supported as both a BTL and an MTL, I would recommend using the MX MTL as the performance is much better. If you are using GM you can only use OB1 or DR, I would recommend OB1 as DR is only available in the trunk and is in development.

To choose a specific PML at runtime use the MCA parameter facilities, for example:


mpirun -np 2 -mca pml cm ./mpi-ping





On Nov 27, 2006, at 7:48 AM, Brock Palen wrote:

Well, im not finding much good information on what 'pml'  is.  Or
what ones are available what one is used by default,  or how to
switch between them.  Is there a paper someplace that describes this?

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Nov 26, 2006, at 11:10 AM, Galen Shipman wrote:

Oh, just noticed you are using GM, PML CM is only available for MX..
sorry..
Galen



On Nov 26, 2006, at 9:08 AM, Galen Shipman wrote:

I would suggest trying Open MPI 1.2b1 and PML CM. You can select
PML CM at runtime via:

mpirun -mca pml cm

Have you tried this?

- Galen



On Nov 21, 2006, at 12:28 PM, Scott Atchley wrote:

On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:

I had sent a message two weeks ago about this problem and talked
with
jeff at SC06 about how it might not be a OMPI problem.  But it
appears now working with myricom that it is a problem in both
lam-7.1.2 and openmpi-1.1.2/1.1.1.   Basically the results from a
HPL
run are wrong, Also causes a large number of packets to be dropped
by the fabric.

This problem does not happen when using mpichgm.  The number of
dropped packets does not go up.  There is a ticket open with
myircom
on this. They are a member of the group working on OMPI but i sent
this out just to bring the list uptodate.

If you have any questions feel free to ask me.  The details are in
the archive.

Brock Palen

Hi all,

I am looking into this at Myricom.

So far, I have compiled OMPI version 1.2b1 using the --with-gm=/
path/
to/gm flag. I have compiled HPCC (contains HPL) using OMPI's mpicc.
Trying to run hpcc fails with "Myrinet/GM on host fog33 was
unable to
find any NICs". See mpirun output below.

I run gm_board_info and it finds two NICs.

I run ompi_info and it has the gm btl (see ompi_info below).

I have tried using the --prefix flag to mpirun as well as setting
PATH and LD_LIBRARY_PATH.

What am I missing?

Scott


% ompi_info -param btl gm
                  MCA btl: parameter "btl_base_debug" (current
value:
"0")
                           If btl_base_debug is 1 standard debug is
output, if > 1 verbose debug
                           is output
                  MCA btl: parameter "btl" (current value: <none>)
                           Default selection set of components for
the btl framework (<none>
                           means "use all components that can be
found")
                  MCA btl: parameter "btl_base_verbose" (current
value: "0")
                           Verbosity level for the btl framework
(0 =
no verbosity)
MCA btl: parameter "btl_gm_free_list_num" (current
value: "8")
MCA btl: parameter "btl_gm_free_list_max" (current
value: "-1")
MCA btl: parameter "btl_gm_free_list_inc" (current
value: "8")
                  MCA btl: parameter "btl_gm_debug" (current
value: "0")
                  MCA btl: parameter "btl_gm_mpool" (current value:
"gm")
                  MCA btl: parameter "btl_gm_max_ports" (current
value: "16")
                  MCA btl: parameter "btl_gm_max_boards" (current
value: "4")
                  MCA btl: parameter "btl_gm_max_modules" (current
value: "4")
                  MCA btl: parameter
"btl_gm_num_high_priority" (current value: "8")
                  MCA btl: parameter "btl_gm_num_repost" (current
value: "4")
                  MCA btl: parameter "btl_gm_port_name" (current
value: "OMPI")
                  MCA btl: parameter "btl_gm_exclusivity" (current
value: "1024")
                  MCA btl: parameter "btl_gm_eager_limit" (current
value: "32768")
MCA btl: parameter "btl_gm_min_send_size" (current
value: "32768")
MCA btl: parameter "btl_gm_max_send_size" (current
value: "65536")
MCA btl: parameter "btl_gm_min_rdma_size" (current
value: "524288")
MCA btl: parameter "btl_gm_max_rdma_size" (current
value: "131072")
                  MCA btl: parameter "btl_gm_flags" (current value:
"50")
                  MCA btl: parameter "btl_gm_bandwidth" (current
value: "250")
                  MCA btl: parameter "btl_gm_priority" (current
value: "0")
                  MCA btl: parameter
"btl_base_warn_component_unused" (current value: "1")
This parameter is used to turn on warning
messages when certain NICs
                           are not used





% mpirun --prefix $OMPI -np 4 --host fog33,fog33,fog34,fog34 -mca
btl
self,sm,gm ./hpcc
------------------------------------------------------------------- -
-
---
--
[0,1,1]: Myrinet/GM on host fog33 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
------------------------------------------------------------------- -
-
---
--
------------------------------------------------------------------- -
-
---
--
[0,1,0]: Myrinet/GM on host fog33 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
------------------------------------------------------------------- -
-
---
--
------------------------------------------------------------------- -
-
---
--
Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------- -
-
---
--
------------------------------------------------------------------- -
-
---
--
Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------- -
-
---
--
------------------------------------------------------------------- -
-
---
--
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's
some
additional information (which may only be relevant to an Open MPI
developer):

   PML add procs failed
   --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------- -
-
---
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)



% ls -l $OMPI
total 1
drwx------  2 atchley softies 496 Nov 21 13:01 bin
drwx------  2 atchley softies 168 Nov 21 13:01 etc
drwx------  3 atchley softies 184 Nov 21 13:01 include
drwx------  3 atchley softies 896 Nov 21 13:01 lib
drwx------  4 atchley softies  96 Nov 21 13:01 man
drwx------  3 atchley softies  72 Nov 21 13:00 share


% ls -l $OMPI/bin
total 340
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpiCC ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpic++ ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpicc ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpicxx ->
opal_wrapper
lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpiexec -> orterun
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpif77 ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpif90 ->
opal_wrapper
lrwxrwxrwx  1 atchley softies      7 Nov 21 13:01 mpirun -> orterun
-rwxr-xr-x  1 atchley softies 138416 Nov 21 13:01 ompi_info
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:00 opalCC ->
opal_wrapper
-rwxr-xr-x  1 atchley softies  24119 Nov 21 13:00 opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:00 opalc++ ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:00 opalcc ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 orteCC ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 ortec++ ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 ortecc ->
opal_wrapper
-rwxr-xr-x  1 atchley softies  26536 Nov 21 13:01 orted
-rwxr-xr-x  1 atchley softies 154770 Nov 21 13:01 orterun

% ls -l $OMPI/lib
total 1741
-rwxr-xr-x  1 atchley softies   1045 Nov 21 13:01
libmca_common_sm.la
lrwxrwxrwx  1 atchley softies     25 Nov 21 13:01
libmca_common_sm.so
-> libmca_common_sm.so.0.0.0
lrwxrwxrwx  1 atchley softies     25 Nov 21 13:01
libmca_common_sm.so.
0 -> libmca_common_sm.so.0.0.0
-rwxr-xr-x  1 atchley softies  10074 Nov 21 13:01
libmca_common_sm.so.
0.0.0
-rwxr-xr-x  1 atchley softies   1100 Nov 21 13:01 libmpi.la
lrwxrwxrwx  1 atchley softies     15 Nov 21 13:01 libmpi.so ->
libmpi.so.0.0.0
lrwxrwxrwx  1 atchley softies     15 Nov 21 13:01 libmpi.so.0 ->
libmpi.so.0.0.0
-rwxr-xr-x  1 atchley softies 640672 Nov 21 13:01 libmpi.so.0.0.0
-rwxr-xr-x  1 atchley softies   1005 Nov 21 13:01 libmpi_cxx.la
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_cxx.so ->
libmpi_cxx.so.0.0.0
lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so. 0 ->
libmpi_cxx.so.0.0.0
-rwxr-xr-x  1 atchley softies 142062 Nov 21 13:01 libmpi_cxx.so.
0.0.0
-rwxr-xr-x  1 atchley softies   1009 Nov 21 13:01 libmpi_f77.la
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_f77.so ->
libmpi_f77.so.0.0.0
lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so. 0 ->
libmpi_f77.so.0.0.0
-rwxr-xr-x  1 atchley softies 283394 Nov 21 13:01 libmpi_f77.so.
0.0.0
-rwxr-xr-x  1 atchley softies    996 Nov 21 13:00 libopal.la
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 libopal.so ->
libopal.so.0.0.0
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 libopal.so.0 ->
libopal.so.0.0.0
-rwxr-xr-x  1 atchley softies 285769 Nov 21 13:00 libopal.so.0.0.0
-rwxr-xr-x  1 atchley softies   1051 Nov 21 13:00 liborte.la
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 liborte.so ->
liborte.so.0.0.0
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 liborte.so.0 ->
liborte.so.0.0.0
-rwxr-xr-x  1 atchley softies 380223 Nov 21 13:00 liborte.so.0.0.0
drwx------  2 atchley softies   4160 Nov 21 13:01 openmpi
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to