Oh, just noticed you are using GM, PML CM is only available for MX.. sorry..
Galen


On Nov 26, 2006, at 9:08 AM, Galen Shipman wrote:

I would suggest trying Open MPI 1.2b1 and PML CM. You can select PML CM at runtime via:

mpirun -mca pml cm

Have you tried this?

- Galen



On Nov 21, 2006, at 12:28 PM, Scott Atchley wrote:

On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:

I had sent a message two weeks ago about this problem and talked with
jeff at SC06 about how it might not be a OMPI problem.  But it
appears now working with myricom that it is a problem in both
lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the results from a HPL
run are wrong,  Also causes a large number of packets to be dropped
by the fabric.

This problem does not happen when using mpichgm.  The number of
dropped packets does not go up.  There is a ticket open with myircom
on this.  They are a member of the group working on OMPI but i sent
this out just to bring the list uptodate.

If you have any questions feel free to ask me.  The details are in
the archive.

Brock Palen

Hi all,

I am looking into this at Myricom.

So far, I have compiled OMPI version 1.2b1 using the --with-gm=/path/
to/gm flag. I have compiled HPCC (contains HPL) using OMPI's mpicc.
Trying to run hpcc fails with "Myrinet/GM on host fog33 was unable to
find any NICs". See mpirun output below.

I run gm_board_info and it finds two NICs.

I run ompi_info and it has the gm btl (see ompi_info below).

I have tried using the --prefix flag to mpirun as well as setting
PATH and LD_LIBRARY_PATH.

What am I missing?

Scott


% ompi_info -param btl gm
                  MCA btl: parameter "btl_base_debug" (current value:
"0")
                           If btl_base_debug is 1 standard debug is
output, if > 1 verbose debug
                           is output
                  MCA btl: parameter "btl" (current value: <none>)
                           Default selection set of components for
the btl framework (<none>
means "use all components that can be found")
                  MCA btl: parameter "btl_base_verbose" (current
value: "0")
                           Verbosity level for the btl framework (0 =
no verbosity)
                  MCA btl: parameter "btl_gm_free_list_num" (current
value: "8")
                  MCA btl: parameter "btl_gm_free_list_max" (current
value: "-1")
                  MCA btl: parameter "btl_gm_free_list_inc" (current
value: "8")
MCA btl: parameter "btl_gm_debug" (current value: "0")
                  MCA btl: parameter "btl_gm_mpool" (current value:
"gm")
                  MCA btl: parameter "btl_gm_max_ports" (current
value: "16")
                  MCA btl: parameter "btl_gm_max_boards" (current
value: "4")
                  MCA btl: parameter "btl_gm_max_modules" (current
value: "4")
                  MCA btl: parameter
"btl_gm_num_high_priority" (current value: "8")
                  MCA btl: parameter "btl_gm_num_repost" (current
value: "4")
                  MCA btl: parameter "btl_gm_port_name" (current
value: "OMPI")
                  MCA btl: parameter "btl_gm_exclusivity" (current
value: "1024")
                  MCA btl: parameter "btl_gm_eager_limit" (current
value: "32768")
                  MCA btl: parameter "btl_gm_min_send_size" (current
value: "32768")
                  MCA btl: parameter "btl_gm_max_send_size" (current
value: "65536")
                  MCA btl: parameter "btl_gm_min_rdma_size" (current
value: "524288")
                  MCA btl: parameter "btl_gm_max_rdma_size" (current
value: "131072")
                  MCA btl: parameter "btl_gm_flags" (current value:
"50")
                  MCA btl: parameter "btl_gm_bandwidth" (current
value: "250")
                  MCA btl: parameter "btl_gm_priority" (current
value: "0")
                  MCA btl: parameter
"btl_base_warn_component_unused" (current value: "1")
                           This parameter is used to turn on warning
messages when certain NICs
                           are not used





% mpirun --prefix $OMPI -np 4 --host fog33,fog33,fog34,fog34 -mca btl
self,sm,gm ./hpcc
--------------------------------------------------------------------- ---
--
[0,1,1]: Myrinet/GM on host fog33 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------- ---
--
--------------------------------------------------------------------- ---
--
[0,1,0]: Myrinet/GM on host fog33 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------- ---
--
--------------------------------------------------------------------- ---
--
Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------- ---
--
--------------------------------------------------------------------- ---
--
Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------- ---
--
--------------------------------------------------------------------- ---
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

   PML add procs failed
   --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------- ---
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)



% ls -l $OMPI
total 1
drwx------  2 atchley softies 496 Nov 21 13:01 bin
drwx------  2 atchley softies 168 Nov 21 13:01 etc
drwx------  3 atchley softies 184 Nov 21 13:01 include
drwx------  3 atchley softies 896 Nov 21 13:01 lib
drwx------  4 atchley softies  96 Nov 21 13:01 man
drwx------  3 atchley softies  72 Nov 21 13:00 share


% ls -l $OMPI/bin
total 340
lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpiCC -> opal_wrapper lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpic++ -> opal_wrapper lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicc -> opal_wrapper lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicxx -> opal_wrapper
lrwxrwxrwx  1 atchley softies      7 Nov 21 13:01 mpiexec -> orterun
lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif77 -> opal_wrapper lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif90 -> opal_wrapper
lrwxrwxrwx  1 atchley softies      7 Nov 21 13:01 mpirun -> orterun
-rwxr-xr-x  1 atchley softies 138416 Nov 21 13:01 ompi_info
lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalCC -> opal_wrapper
-rwxr-xr-x  1 atchley softies  24119 Nov 21 13:00 opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:00 opalc++ ->
opal_wrapper
lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalcc -> opal_wrapper lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 orteCC -> opal_wrapper
lrwxrwxrwx  1 atchley softies     12 Nov 21 13:01 ortec++ ->
opal_wrapper
lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortecc -> opal_wrapper
-rwxr-xr-x  1 atchley softies  26536 Nov 21 13:01 orted
-rwxr-xr-x  1 atchley softies 154770 Nov 21 13:01 orterun

% ls -l $OMPI/lib
total 1741
-rwxr-xr-x  1 atchley softies   1045 Nov 21 13:01 libmca_common_sm.la
lrwxrwxrwx  1 atchley softies     25 Nov 21 13:01 libmca_common_sm.so
-> libmca_common_sm.so.0.0.0
lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01 libmca_common_sm.so.
0 -> libmca_common_sm.so.0.0.0
-rwxr-xr-x 1 atchley softies 10074 Nov 21 13:01 libmca_common_sm.so.
0.0.0
-rwxr-xr-x  1 atchley softies   1100 Nov 21 13:01 libmpi.la
lrwxrwxrwx  1 atchley softies     15 Nov 21 13:01 libmpi.so ->
libmpi.so.0.0.0
lrwxrwxrwx  1 atchley softies     15 Nov 21 13:01 libmpi.so.0 ->
libmpi.so.0.0.0
-rwxr-xr-x  1 atchley softies 640672 Nov 21 13:01 libmpi.so.0.0.0
-rwxr-xr-x  1 atchley softies   1005 Nov 21 13:01 libmpi_cxx.la
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_cxx.so ->
libmpi_cxx.so.0.0.0
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_cxx.so.0 ->
libmpi_cxx.so.0.0.0
-rwxr-xr-x  1 atchley softies 142062 Nov 21 13:01 libmpi_cxx.so.0.0.0
-rwxr-xr-x  1 atchley softies   1009 Nov 21 13:01 libmpi_f77.la
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_f77.so ->
libmpi_f77.so.0.0.0
lrwxrwxrwx  1 atchley softies     19 Nov 21 13:01 libmpi_f77.so.0 ->
libmpi_f77.so.0.0.0
-rwxr-xr-x  1 atchley softies 283394 Nov 21 13:01 libmpi_f77.so.0.0.0
-rwxr-xr-x  1 atchley softies    996 Nov 21 13:00 libopal.la
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 libopal.so ->
libopal.so.0.0.0
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 libopal.so.0 ->
libopal.so.0.0.0
-rwxr-xr-x  1 atchley softies 285769 Nov 21 13:00 libopal.so.0.0.0
-rwxr-xr-x  1 atchley softies   1051 Nov 21 13:00 liborte.la
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 liborte.so ->
liborte.so.0.0.0
lrwxrwxrwx  1 atchley softies     16 Nov 21 13:00 liborte.so.0 ->
liborte.so.0.0.0
-rwxr-xr-x  1 atchley softies 380223 Nov 21 13:00 liborte.so.0.0.0
drwx------  2 atchley softies   4160 Nov 21 13:01 openmpi
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to