On Nov 27, 2006, at 10:56 AM, Galen Shipman wrote:

Note that MX is supported as both a BTL and an MTL, I would recommend
using the MX MTL as the performance is much better. If you are using
GM you can only use OB1 or DR, I would recommend OB1 as DR is only
available in the trunk and is in development.

In fact it depend on what you're looking for. If your algorithm is latency bounded then using the MTL is the right choice. If what you're looking at is bandwidth or if the MPI data-type used are not contiguous then using the MX BTL will give you better performances. Reading the 3 papers about the message layer in Open MPI might give you a better understanding on how everything works inside.

  Thanks,
    george.



To choose a specific PML at runtime use the MCA parameter facilities,
for example:


mpirun -np 2 -mca pml cm ./mpi-ping





On Nov 27, 2006, at 7:48 AM, Brock Palen wrote:

Well, im not finding much good information on what 'pml'  is.  Or
what ones are available what one is used by default,  or how to
switch between them.  Is there a paper someplace that describes this?

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Nov 26, 2006, at 11:10 AM, Galen Shipman wrote:

Oh, just noticed you are using GM, PML CM is only available for MX..
sorry..
Galen



On Nov 26, 2006, at 9:08 AM, Galen Shipman wrote:

I would suggest trying Open MPI 1.2b1 and PML CM. You can select
PML CM at runtime via:

mpirun -mca pml cm

Have you tried this?

- Galen



On Nov 21, 2006, at 12:28 PM, Scott Atchley wrote:

On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:

I had sent a message two weeks ago about this problem and talked
with
jeff at SC06 about how it might not be a OMPI problem.  But it
appears now working with myricom that it is a problem in both
lam-7.1.2 and openmpi-1.1.2/1.1.1.   Basically the results from a
HPL
run are wrong,  Also causes a large number of packets to be
dropped
by the fabric.

This problem does not happen when using mpichgm.  The number of
dropped packets does not go up.  There is a ticket open with
myircom
on this.  They are a member of the group working on OMPI but i
sent
this out just to bring the list uptodate.

If you have any questions feel free to ask me. The details are in
the archive.

Brock Palen

Hi all,

I am looking into this at Myricom.

So far, I have compiled OMPI version 1.2b1 using the --with-gm=/
path/
to/gm flag. I have compiled HPCC (contains HPL) using OMPI's mpicc.
Trying to run hpcc fails with "Myrinet/GM on host fog33 was
unable to
find any NICs". See mpirun output below.

I run gm_board_info and it finds two NICs.

I run ompi_info and it has the gm btl (see ompi_info below).

I have tried using the --prefix flag to mpirun as well as setting
PATH and LD_LIBRARY_PATH.

What am I missing?

Scott


% ompi_info -param btl gm
                  MCA btl: parameter "btl_base_debug" (current
value:
"0")
If btl_base_debug is 1 standard debug is
output, if > 1 verbose debug
                           is output
                  MCA btl: parameter "btl" (current value: <none>)
                           Default selection set of components for
the btl framework (<none>
                           means "use all components that can be
found")
                  MCA btl: parameter "btl_base_verbose" (current
value: "0")
                           Verbosity level for the btl framework
(0 =
no verbosity)
                  MCA btl: parameter
"btl_gm_free_list_num" (current
value: "8")
                  MCA btl: parameter
"btl_gm_free_list_max" (current
value: "-1")
                  MCA btl: parameter
"btl_gm_free_list_inc" (current
value: "8")
                  MCA btl: parameter "btl_gm_debug" (current
value: "0")
MCA btl: parameter "btl_gm_mpool" (current value:
"gm")
                  MCA btl: parameter "btl_gm_max_ports" (current
value: "16")
                  MCA btl: parameter "btl_gm_max_boards" (current
value: "4")
                  MCA btl: parameter "btl_gm_max_modules" (current
value: "4")
                  MCA btl: parameter
"btl_gm_num_high_priority" (current value: "8")
                  MCA btl: parameter "btl_gm_num_repost" (current
value: "4")
                  MCA btl: parameter "btl_gm_port_name" (current
value: "OMPI")
                  MCA btl: parameter "btl_gm_exclusivity" (current
value: "1024")
                  MCA btl: parameter "btl_gm_eager_limit" (current
value: "32768")
                  MCA btl: parameter
"btl_gm_min_send_size" (current
value: "32768")
                  MCA btl: parameter
"btl_gm_max_send_size" (current
value: "65536")
                  MCA btl: parameter
"btl_gm_min_rdma_size" (current
value: "524288")
                  MCA btl: parameter
"btl_gm_max_rdma_size" (current
value: "131072")
MCA btl: parameter "btl_gm_flags" (current value:
"50")
                  MCA btl: parameter "btl_gm_bandwidth" (current
value: "250")
                  MCA btl: parameter "btl_gm_priority" (current
value: "0")
                  MCA btl: parameter
"btl_base_warn_component_unused" (current value: "1")
                           This parameter is used to turn on
warning
messages when certain NICs
                           are not used





% mpirun --prefix $OMPI -np 4 --host fog33,fog33,fog34,fog34 -mca
btl
self,sm,gm ./hpcc
------------------------------------------------------------------ -
-
-
---
--
[0,1,1]: Myrinet/GM on host fog33 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
------------------------------------------------------------------ -
-
-
---
--
------------------------------------------------------------------ -
-
-
---
--
[0,1,0]: Myrinet/GM on host fog33 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
------------------------------------------------------------------ -
-
-
---
--
------------------------------------------------------------------ -
-
-
---
--
Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------ -
-
-
---
--
------------------------------------------------------------------ -
-
-
---
--
Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------ -
-
-
---
--
------------------------------------------------------------------ -
-
-
---
--
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort.  There are many reasons that a parallel process
can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's
some
additional information (which may only be relevant to an Open MPI
developer):

   PML add procs failed
   --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------ -
-
-
---
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)



% ls -l $OMPI
total 1
drwx------  2 atchley softies 496 Nov 21 13:01 bin
drwx------  2 atchley softies 168 Nov 21 13:01 etc
drwx------  3 atchley softies 184 Nov 21 13:01 include
drwx------  3 atchley softies 896 Nov 21 13:01 lib
drwx------  4 atchley softies  96 Nov 21 13:01 man
drwx------  3 atchley softies  72 Nov 21 13:00 share


% ls -l $OMPI/bin
total 340
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpiCC ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpic++ ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpicc ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpicxx ->
opal_wrapper
lrwxrwxrwx  1 atchley softies      7 Nov 21 13:01 mpiexec ->
orterun
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpif77 ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 mpif90 ->
opal_wrapper
lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpirun -> orterun
-rwxr-xr-x  1 atchley softies 138416 Nov 21 13:01 ompi_info
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:00 opalCC ->
opal_wrapper
-rwxr-xr-x  1 atchley softies  24119 Nov 21 13:00 opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:00 opalc++ ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:00 opalcc ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 orteCC ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 ortec++ ->
opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 ortecc ->
opal_wrapper
-rwxr-xr-x  1 atchley softies  26536 Nov 21 13:01 orted
-rwxr-xr-x  1 atchley softies 154770 Nov 21 13:01 orterun

% ls -l $OMPI/lib
total 1741
-rwxr-xr-x  1 atchley softies   1045 Nov 21 13:01
libmca_common_sm.la
lrwxrwxrwx  1 atchley softies     25 Nov 21 13:01
libmca_common_sm.so
-> libmca_common_sm.so.0.0.0
lrwxrwxrwx  1 atchley softies     25 Nov 21 13:01
libmca_common_sm.so.
0 -> libmca_common_sm.so.0.0.0
-rwxr-xr-x  1 atchley softies  10074 Nov 21 13:01
libmca_common_sm.so.
0.0.0
-rwxr-xr-x  1 atchley softies   1100 Nov 21 13:01 libmpi.la
lrwxrwxrwx  1 atchley softies     15 Nov 21 13:01 libmpi.so ->
libmpi.so.0.0.0
lrwxrwxrwx  1 atchley softies     15 Nov 21 13:01 libmpi.so.0 ->
libmpi.so.0.0.0
-rwxr-xr-x  1 atchley softies 640672 Nov 21 13:01 libmpi.so.0.0.0
-rwxr-xr-x  1 atchley softies   1005 Nov 21 13:01 libmpi_cxx.la
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_cxx.so ->
libmpi_cxx.so.0.0.0
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_cxx.so.
0 ->
libmpi_cxx.so.0.0.0
-rwxr-xr-x  1 atchley softies 142062 Nov 21 13:01 libmpi_cxx.so.
0.0.0
-rwxr-xr-x  1 atchley softies   1009 Nov 21 13:01 libmpi_f77.la
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_f77.so ->
libmpi_f77.so.0.0.0
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_f77.so.
0 ->
libmpi_f77.so.0.0.0
-rwxr-xr-x  1 atchley softies 283394 Nov 21 13:01 libmpi_f77.so.
0.0.0
-rwxr-xr-x  1 atchley softies    996 Nov 21 13:00 libopal.la
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 libopal.so ->
libopal.so.0.0.0
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 libopal.so.0 ->
libopal.so.0.0.0
-rwxr-xr-x  1 atchley softies 285769 Nov 21 13:00 libopal.so.0.0.0
-rwxr-xr-x  1 atchley softies   1051 Nov 21 13:00 liborte.la
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 liborte.so ->
liborte.so.0.0.0
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 liborte.so.0 ->
liborte.so.0.0.0
-rwxr-xr-x  1 atchley softies 380223 Nov 21 13:00 liborte.so.0.0.0
drwx------  2 atchley softies   4160 Nov 21 13:01 openmpi
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to