Re: [OMPI users] problems with HPLinpack over myrinet MX-10G

Scott Atchley Wed, 14 Feb 2007 19:27:26 -0500

On Feb 14, 2007, at 12:33 PM, Alex Tumanov wrote:

Hello,


I recently tried running HPLinpack, compiled with OMPI, over myrinet
MX interconnect. Running a simple hello world program works, but XHPL
fails with an error occurring when it tries to MPI_Send:

# mpirun -np 4 -H l0-0,c0-2 --prefix $MPIHOME --mca btl mx,self

If you are running more than one process per node, you may need toadd shmem to mx,self. Also, OMPI offers another MX via pml.Performance was better using pml but George may be getting the btlcloser.

Also, try with and without MX_RCACHE=1 (or MX_RCACHE=2 for the pml)in your environment.

/opt/hpl/openmpi-hpl/bin/xhpl
[l0-0.local:04707] *** An error occurred in MPI_Send
[l0-0.local:04707] *** on communicator MPI_COMM_WORLD
[l0-0.local:04707] *** MPI_ERR_INTERN: internal error
[l0-0.local:04707] *** MPI_ERRORS_ARE_FATAL (goodbye)

mpirun noticed that job rank 0 with PID 4706 on node "l0-0" exitedon signal 15.

3 additional processes aborted (not shown)

# mpirun -np 4 -H l0-0,c0-2 --prefix $MPIHOME --mca btl mx,self ~/atumanov/hello

Hello from Alex' MPI test program
Process 1 on compute-0-2.local out of 4
Hello from Alex' MPI test program
Hello from Alex' MPI test program
Process 0 on l0-0.local out of 4
Process 3 on compute-0-2.local out of 4
Hello from Alex' MPI test program
Process 2 on l0-0.local out of 4

The output from mx_info is as follows:

-------------------------------------------------------------------------------------------------

MX Version: 1.2.0g

We have a new version, 1.2.0h, that we recommend all users to upgradeto.

MX Build: r...@blackopt.sw.myri.com:/home/install/rocks/src/roll/myrinet_mx10g/BUILD/mx-1.2.0g

Wed Jan 17 18:51:12 PST 2007
1 Myrinet board installed.

The MX driver is configured to support up to 4 instances and 1024nodes.

===================================================================
Instance #0:  299.8 MHz LANai, PCI-E x8, 2 MB SRAM
        Status:         Running, P0: Link up
        MAC Address:    00:60:dd:47:7d:73
        Product code:   10G-PCIE-8A-C
        Part number:    09-03362
        Serial number:  314581

Mapper: 00:60:dd:47:7d:73, version = 0x591b1c74,configured

        Mapped hosts:   2

ROUTE COUNT

INDEX    MAC ADDRESS     HOST NAME                                P0
-----    -----------     ---------                                ---
   0) 00:60:dd:47:7d:73 compute-0-2.local:0                     D 0,0
   1) 00:60:dd:47:7d:72 l0-0.local:0                        1,0

-------------------------------------------------------------------------------------------------


There are several questions. First of all, am I able to initiate OMPI
over MX jobs from the headnode to be executed on 2 compute nodes even
though the headnode does not have MX hardware?


Any OMPI people have comments?

Secondly, looking at
next to last line in the mx_info output, what does  letter 'D' stand
for?

This means that while a route to this node was loaded at some pointin the past, the most recent batch of route loads were from a mapthat did not contain this node. This could be caused by the nodegoing down, losing connectivity, or just having its fma crash or bekilled. Note that in the last case, the node is still on the fabric,the old routes likely still work, but it just has no fma running.

Third, the MX interconnect support OMPI provides - does it mean
MX-2G or there's support for MX-10G as well?

Both. If you build OMPI with shared library support, you can changebetween MX-10G and MX-2G via LD_LIBRARY_PATH.


Scott

If anybody has encountered a similar problem and was able to
circumvent it please do let me know.

Many thanks for your time and for bringing the community together.

Sincerely,
Alex.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] problems with HPLinpack over myrinet MX-10G

Reply via email to