Hi Henk,
By specifying '--mca btl mx,self' you are telling Open MPI not to use
its shared memory support. If you want to use Open MPI's shared memory
support, you must add 'sm' to the list. I.e. '--mca btl mx,self'. If you
would rather use MX's shared memory support, instead use '--mca btl
mx,self --mca btl_mx_shared_mem 1'. However, in most cases I believe
Open MPI's shared memory support is a bit better.
Alternatively, if you don't specify any btls, Open MPI should figure out
what to use automatically.
Hope this helps,
Tim
SLIM H.A. wrote:
Hello
I have compiled openmpi-1.2.3 with the --with-mx=<directory>
configuration and gcc compiler. On testing with 4-8 slots I get an error
message, the mx ports are busy:
mpirun --mca btl mx,self -np 4 ./cpi
[node001:10071] mca_btl_mx_init: mx_open_endpoint() failed with
status=20
[node001:10074] mca_btl_mx_init: mx_open_endpoint() failed with
status=20
[node001:10073] mca_btl_mx_init: mx_open_endpoint() failed with
status=20
------------------------------------------------------------------------
--
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
... snipped
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 10071 on node node001 exited on
signal 1 (Hangup).
I would not expect mx messages as communication should not go through
the mx card? (This is a twin dual core shared memory node)
The same happens when testing on 2 nodes, using a hostfile.
I checked the state of the mx card with mx_endpoint_info and mx_info,
they are healthy and free.
What is missing here?
Thanks
Henk
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users