Dear Tim,

Your and Tim Matox's suggestion yielded following results,

*1. /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus1,indus2" -mca btl_base_debug 1000 ./hello*

/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl mx,sm,self -host "indus1,indus2,indus3,indus4" -mca btl_base_debug 1000 ./hello
[indus1:29331] select: initializing btl component mx
[indus1:29331] select: init returned failure
[indus1:29331] select: module mx unloaded
[indus1:29331] select: initializing btl component sm
[indus1:29331] select: init returned success
[indus1:29331] select: initializing btl component self
[indus1:29331] select: init returned success
[indus3:13520] select: initializing btl component mx
[indus3:13520] select: init returned failure
[indus3:13520] select: module mx unloaded
[indus3:13520] select: initializing btl component sm
[indus3:13520] select: init returned success
[indus3:13520] select: initializing btl component self
[indus3:13520] select: init returned success
[indus4:15486] select: initializing btl component mx
[indus4:15486] select: init returned failure
[indus4:15486] select: module mx unloaded
[indus4:15486] select: initializing btl component sm
[indus4:15486] select: init returned success
[indus4:15486] select: initializing btl component self
[indus4:15486] select: init returned success
[indus2:11351] select: initializing btl component mx
[indus2:11351] select: init returned failure
[indus2:11351] select: module mx unloaded
[indus2:11351] select: initializing btl component sm
[indus2:11351] select: init returned success
[indus2:11351] select: initializing btl component self
[indus2:11351] select: init returned success
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 PML add procs failed
 --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT--------------------------------------------------------------------------
Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 ; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 PML add procs failed
 --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 PML add procs failed
 --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
PML add procs failed
 --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)




*2.1 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca mtl mx -host "indus1,indus2,indus3,indus4" ./hello*

This command works fine

*2.2 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca mtl mx -host "indus1,indus2,indus3,indus4" -mca pml cm ./hello*

This command works fine.
Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca pml cm -host "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"*, this command works fine. but *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca pml cm -host "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"* hangs for indefinite time.


Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx,sm,self -host "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"* works fine

*2.3 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx -host "indus1,indus2,indus3,indus4" -mca pml cm ./hello*

This command hangs the machines for indefinite time.
Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx -host "indus1,indus2,indus3,indus4" -mca pml cm -mca mtl_base_debug 1000 ./hello"* hangs the
systems for indefinite time.

*2.4 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx,sm,self -host "indus1,indus2,indus3,indus4" -mca pml cm -mca mtl_base_debug 1000 ./hello*

This command hangs the machines for indefinite time.

Please notice that running more than four mpi processes hangs the machines. Any suggestion please.

Thanks,

Best Regards,
Hammad Siddiqi

Tim Prins wrote:
I would reccommend trying a few things:

1. Set some debugging flags and see if that helps. So, I would try something like:
/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,self  -host "indus1,indus2" 
-mca btl_base_debug 1000 ./hello

This will output information as each btl is loaded, and whether or not the load succeeds.

2. Try running with the mx mtl instead of the btl:
/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" ./hello

Similarly, for debug output:
/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" -mca mtl_base_debug 1000 ./hello

Let me know if any of these work.

Thanks,

Tim

On Saturday 29 September 2007 01:53:06 am Hammad Siddiqi wrote:
Hi Terry,

Thanks for replying. The following command is working fine:

/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self  -machinefile
machines ./hello

The contents of machines are:
indus1
indus2
indus3
indus4

I have tried using np=2 over pairs of machines, but the problem is same.
The errors that occur are given below with the command that I am trying.

**Test 1**

/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self  -host
"indus1,indus2" ./hello
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

**Test 2*

*/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self  -host
"indus1,indus3" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
*
*Test 3*
*/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self  -host
"indus1,indus4" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

**Test4**

/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self  -host
"indus2,indus4" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
*

*Test5*

* /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self  -host
"indus2,indus3" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

**Test 6*

* /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self  -host
"indus3,indus4" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

**END OF TESTS**

There is one thing to note that when I run this command including -mca
pml cm it works fine :S

mpirun -np 4 -mca btl mx,sm,self -mca pml cm -machinefile machines  ./hello
Hello MPI! Process 4 of 1 on indus2
Hello MPI! Process 4 of 2 on indus3
Hello MPI! Process 4 of 3 on indus4
Hello MPI! Process 4 of 0 on indus1

To my knowledge this command is not using shared memory and is only
using myrinet as interconnect.
One more thing I cannot start more than 4 processes in this case, The
mpirun process hangs.

Any suggestions?

Once again, thanks for your help.

Regards,
Hammad

Terry Dontje wrote:
Hi Hammad,

It looks to me like none of the btl's could resolve a route between the
node that process rank 0 is on to the other nodes.
I would suggest trying np=2 over a couple pairs of machines to see if
that works and you can truly be sure that only the
first node is having this problem.

It also might be helpful as a sanity check to use the tcp btl instead of
mx and see if you get more traction with that.

--td

*From:* Hammad Siddiqi (/hammad.siddiqi_at_[hidden]/)
*Date:* 2007-09-28 07:38:01




Hello,

I am using Sun HPC Toolkit 7.0 to compile and run my C MPI programs.

I have tested the myrinet installations using myricoms own test
programs. The Myricom software stack I am using is MX and the vesrion is
mx2g-1.1.7, mx_mapper is also used.
We have 4 nodes having 8 dual core processors each (Sun Fire v890) and
the operating system is
Solaris 10 (SunOS indus1 5.10 Generic_125100-10 sun4u sparc
SUNW,Sun-Fire-V890).

The contents of machine file are:
indus1
indus2
indus3
indus4

The output of *mx_info* on each node is given below

=====*=
indus1
*======

MX Version: 1.1.7rc3cvs1_1_fixes
MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
2 Myrinet boards installed.
The MX driver is configured to support up to 4 instances and 1024 nodes.
===================================================================
Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
        Status: Running, P0: Link up
        MAC Address: 00:60:dd:47:ad:7c
        Product code: M3F-PCIXF-2
        Part number: 09-03392
        Serial number: 297218
        Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
        Mapped hosts: 10


            ROUTE COUNT
INDEX MAC ADDRESS HOST NAME P0
----- -----------
--------- ---
   0) 00:60:dd:47:ad:7c indus1:0 1,1
   2) 00:60:dd:47:ad:68 indus4:0 8,3
   3) 00:60:dd:47:b3:e8 indus4:1 7,3
   4) 00:60:dd:47:b3:ab indus2:0 7,3
   5) 00:60:dd:47:ad:66 indus3:0 8,3
   6) 00:60:dd:47:ad:76 indus3:1 8,3
   7) 00:60:dd:47:ad:77 jhelum1:0 8,3
   8) 00:60:dd:47:b3:5a ravi2:0 8,3
   9) 00:60:dd:47:ad:5f ravi2:1 1,1
  10) 00:60:dd:47:b3:bf ravi1:0 8,3
===================================================================

======
*indus2*
======

MX Version: 1.1.7rc3cvs1_1_fixes
MX Build: @indus2:/opt/mx2g-1.1.7rc3 Thu May 31 11:24:03 PKT 2007
2 Myrinet boards installed.
The MX driver is configured to support up to 4 instances and 1024 nodes.
===================================================================
Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
        Status: Running, P0: Link up
        MAC Address: 00:60:dd:47:b3:ab
        Product code: M3F-PCIXF-2
        Part number: 09-03392
        Serial number: 296636
        Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
        Mapped hosts: 10

                                                                ROUTE
COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- ---
   0) 00:60:dd:47:b3:ab indus2:0 1,1
   2) 00:60:dd:47:ad:68 indus4:0 1,1
   3) 00:60:dd:47:b3:e8 indus4:1 8,3
   4) 00:60:dd:47:ad:66 indus3:0 1,1
   5) 00:60:dd:47:ad:76 indus3:1 7,3
   6) 00:60:dd:47:ad:77 jhelum1:0 7,3
   8) 00:60:dd:47:ad:7c indus1:0 8,3
   9) 00:60:dd:47:b3:5a ravi2:0 8,3
  10) 00:60:dd:47:ad:5f ravi2:1 8,3
  11) 00:60:dd:47:b3:bf ravi1:0 7,3
===================================================================
Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
        Status: Running, P0: Link down
        MAC Address: 00:60:dd:47:b3:c3
        Product code: M3F-PCIXF-2
        Part number: 09-03392
        Serial number: 296612
        Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
        Mapped hosts: 10

======
*indus3*
======
MX Version: 1.1.7rc3cvs1_1_fixes
MX Build: @indus3:/opt/mx2g-1.1.7rc3 Thu May 31 11:29:03 PKT 2007
2 Myrinet boards installed.
The MX driver is configured to support up to 4 instances and 1024 nodes.
===================================================================
Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
        Status: Running, P0: Link up
        MAC Address: 00:60:dd:47:ad:66
        Product code: M3F-PCIXF-2
        Part number: 09-03392
        Serial number: 297240
        Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
        Mapped hosts: 10

                                                                ROUTE
COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- ---
   0) 00:60:dd:47:ad:66 indus3:0 1,1
   1) 00:60:dd:47:ad:76 indus3:1 8,3
   2) 00:60:dd:47:ad:68 indus4:0 1,1
   3) 00:60:dd:47:b3:e8 indus4:1 6,3
   4) 00:60:dd:47:ad:77 jhelum1:0 8,3
   5) 00:60:dd:47:b3:ab indus2:0 1,1
   7) 00:60:dd:47:ad:7c indus1:0 8,3
   8) 00:60:dd:47:b3:5a ravi2:0 8,3
   9) 00:60:dd:47:ad:5f ravi2:1 7,3
  10) 00:60:dd:47:b3:bf ravi1:0 8,3
===================================================================
Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
        Status: Running, P0: Link up
        MAC Address: 00:60:dd:47:ad:76
        Product code: M3F-PCIXF-2
        Part number: 09-03392
        Serial number: 297224
        Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
        Mapped hosts: 10

                                                                ROUTE
COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- ---
   0) 00:60:dd:47:ad:66 indus3:0 8,3
   1) 00:60:dd:47:ad:76 indus3:1 1,1
   2) 00:60:dd:47:ad:68 indus4:0 7,3
   3) 00:60:dd:47:b3:e8 indus4:1 1,1
   4) 00:60:dd:47:ad:77 jhelum1:0 1,1
   5) 00:60:dd:47:b3:ab indus2:0 7,3
   7) 00:60:dd:47:ad:7c indus1:0 8,3
   8) 00:60:dd:47:b3:5a ravi2:0 6,3
   9) 00:60:dd:47:ad:5f ravi2:1 8,3
  10) 00:60:dd:47:b3:bf ravi1:0 8,3

======
*indus4*
======

MX Version: 1.1.7rc3cvs1_1_fixes
MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
2 Myrinet boards installed.
The MX driver is configured to support up to 4 instances and 1024 nodes.
===================================================================
Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
        Status: Running, P0: Link up
        MAC Address: 00:60:dd:47:ad:68
        Product code: M3F-PCIXF-2
        Part number: 09-03392
        Serial number: 297238
        Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
        Mapped hosts: 10

                                                                ROUTE
COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- ---
   0) 00:60:dd:47:ad:68 indus4:0 1,1
   1) 00:60:dd:47:b3:e8 indus4:1 7,3
   2) 00:60:dd:47:ad:77 jhelum1:0 7,3
   3) 00:60:dd:47:ad:66 indus3:0 1,1
   4) 00:60:dd:47:ad:76 indus3:1 7,3
   5) 00:60:dd:47:b3:ab indus2:0 1,1
   7) 00:60:dd:47:ad:7c indus1:0 7,3
   8) 00:60:dd:47:b3:5a ravi2:0 7,3
   9) 00:60:dd:47:ad:5f ravi2:1 8,3
  10) 00:60:dd:47:b3:bf ravi1:0 7,3
===================================================================
Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
        Status: Running, P0: Link up
        MAC Address: 00:60:dd:47:b3:e8
        Product code: M3F-PCIXF-2
        Part number: 09-03392
        Serial number: 296575
        Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
        Mapped hosts: 10

                                                                ROUTE
COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- ---
   0) 00:60:dd:47:ad:68 indus4:0 6,3
   1) 00:60:dd:47:b3:e8 indus4:1 1,1
   2) 00:60:dd:47:ad:77 jhelum1:0 1,1
   3) 00:60:dd:47:ad:66 indus3:0 8,3
   4) 00:60:dd:47:ad:76 indus3:1 1,1
   5) 00:60:dd:47:b3:ab indus2:0 8,3
   7) 00:60:dd:47:ad:7c indus1:0 7,3
   8) 00:60:dd:47:b3:5a ravi2:0 6,3
   9) 00:60:dd:47:ad:5f ravi2:1 8,3
  10) 00:60:dd:47:b3:bf ravi1:0 8,3

The output from *ompi_info* is:

                Open MPI: 1.2.1r14096-ct7b030r1838
   Open MPI SVN revision: 0
                Open RTE: 1.2.1r14096-ct7b030r1838
   Open RTE SVN revision: 0
                    OPAL: 1.2.1r14096-ct7b030r1838
       OPAL SVN revision: 0
                  Prefix: /opt/SUNWhpc/HPC7.0
 Configured architecture: sparc-sun-solaris2.10
           Configured by: root
           Configured on: Fri Mar 30 12:49:36 EDT 2007
          Configure host: burpen-on10-0
                Built by: root
                Built on: Fri Mar 30 13:10:46 EDT 2007
              Built host: burpen-on10-0
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: trivial
              C compiler: cc
     C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
            C++ compiler: CC
   C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
      Fortran77 compiler: f77
  Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
      Fortran90 compiler: f95
  Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: yes
          Thread support: no
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: yes
           MCA backtrace: printstack (MCA v1.0, API v1.0, Component
v1.2.1)
           MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component
v1.2.1)
               MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
               MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2.1)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                 MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2.1)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.3, Component
v1.2.1)
                 MCA ras: gridengine (MCA v1.0, API v1.3, Component
v1.2.1)
                 MCA ras: localhost (MCA v1.0, API v1.3, Component
v1.2.1)
                 MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA rds: hostfile (MCA v1.0, API v1.3, Component
v1.2.1) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA rds:
resfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rmaps: round_robin
(MCA v1.0, API v1.3, Component v1.2.1)
                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pls: gridengine (MCA v1.0, API v1.3, Component
v1.2.1)
                 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component
v1.2.1)

When I try to run a simple hello world program by issuing following
command:

*mpirun -np 4 -mca btl mx,sm,self -machinefile machines ./hello

*The following error appears:

------------------------------------------------------------------------
--

Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------
--

------------------------------------------------------------------------
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------
--

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
------------------------------------------------------------------------
--

Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------
--

------------------------------------------------------------------------
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------
--

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
------------------------------------------------------------------------
--

Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------
--

------------------------------------------------------------------------
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------
--

------------------------------------------------------------------------
--

Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------
--

------------------------------------------------------------------------
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
12) instead of "Success" (0)
------------------------------------------------------------------------
--

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

The output from more */var/run/fms/fma.log*

Sat Sep 22 10:47:50 2007 NIC 0: M3F-PCIXF-2 s/n=297218 1 ports, speed=2G
Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:7c
Sat Sep 22 10:47:50 2007 NIC 1: M3F-PCIXF-2 s/n=297248 1 ports, speed=2G
Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:5e
Sat Sep 22 10:47:50 2007 fms-1.2.1 fma starting
Sat Sep 22 10:47:50 2007 Mapper was 00:00:00:00:00:00, l=0, is now
00:60:dd:47:ad:7c, l=1
Sat Sep 22 10:47:50 2007 Mapping fabric...
Sat Sep 22 10:47:54 2007 Mapper was 00:60:dd:47:ad:7c, l=1, is now
00:60:dd:47:b3:e8, l=1
Sat Sep 22 10:47:54 2007 Cancelling mapping
Sat Sep 22 10:47:59 2007 5 hosts, 8 nics, 6 xbars, 40 links
Sat Sep 22 10:47:59 2007 map version is 1987557551
Sat Sep 22 10:47:59 2007 Found NIC 0 at index 3!
Sat Sep 22 10:47:59 2007 Found NIC 1 at index 2!
Sat Sep 22 10:47:59 2007 map seems OK
Sat Sep 22 10:47:59 2007 Routing took 0 seconds
Mon Sep 24 14:26:46 2007 Requesting remap from indus4
(00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:5a, lev=1, pkt_type=0
Mon Sep 24 14:26:51 2007 6 hosts, 10 nics, 6 xbars, 42 links
Mon Sep 24 14:26:51 2007 map version is 1987557552
Mon Sep 24 14:26:51 2007 Found NIC 0 at index 3!
Mon Sep 24 14:26:51 2007 Found NIC 1 at index 2!
Mon Sep 24 14:26:51 2007 map seems OK
Mon Sep 24 14:26:51 2007 Routing took 0 seconds
Mon Sep 24 14:35:17 2007 Requesting remap from indus4
(00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:bf, lev=1, pkt_type=0
Mon Sep 24 14:35:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
Mon Sep 24 14:35:19 2007 map version is 1987557553
Mon Sep 24 14:35:19 2007 Found NIC 0 at index 5!
Mon Sep 24 14:35:19 2007 Found NIC 1 at index 4!
Mon Sep 24 14:35:19 2007 map seems OK
Mon Sep 24 14:35:19 2007 Routing took 0 seconds
Tue Sep 25 21:47:52 2007 6 hosts, 9 nics, 6 xbars, 41 links
Tue Sep 25 21:47:52 2007 map version is 1987557554
Tue Sep 25 21:47:52 2007 Found NIC 0 at index 3!
Tue Sep 25 21:47:52 2007 Found NIC 1 at index 2!
Tue Sep 25 21:47:52 2007 map seems OK
Tue Sep 25 21:47:52 2007 Routing took 0 seconds
Tue Sep 25 21:52:02 2007 Requesting remap from indus4
(00:60:dd:47:b3:e8): empty port x0p15 is no longer empty
Tue Sep 25 21:52:07 2007 6 hosts, 10 nics, 6 xbars, 42 links
Tue Sep 25 21:52:07 2007 map version is 1987557555
Tue Sep 25 21:52:07 2007 Found NIC 0 at index 4!
Tue Sep 25 21:52:07 2007 Found NIC 1 at index 3!
Tue Sep 25 21:52:07 2007 map seems OK
Tue Sep 25 21:52:07 2007 Routing took 0 seconds
Tue Sep 25 21:52:23 2007 7 hosts, 11 nics, 6 xbars, 43 links
Tue Sep 25 21:52:23 2007 map version is 1987557556
Tue Sep 25 21:52:23 2007 Found NIC 0 at index 6!
Tue Sep 25 21:52:23 2007 Found NIC 1 at index 5!
Tue Sep 25 21:52:23 2007 map seems OK
Tue Sep 25 21:52:23 2007 Routing took 0 seconds
Wed Sep 26 05:07:01 2007 Requesting remap from indus4
(00:60:dd:47:b3:e8): verify failed x1p2, nic 0, port 0 route=-9 4 10
reply=-10 -4 9 , remote=ravi2 NIC
 1, p0 mac=00:60:dd:47:ad:5f
Wed Sep 26 05:07:06 2007 6 hosts, 9 nics, 6 xbars, 41 links
Wed Sep 26 05:07:06 2007 map version is 1987557557
Wed Sep 26 05:07:06 2007 Found NIC 0 at index 3!
Wed Sep 26 05:07:06 2007 Found NIC 1 at index 2!
Wed Sep 26 05:07:06 2007 map seems OK
Wed Sep 26 05:07:06 2007 Routing took 0 seconds
Wed Sep 26 05:11:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
Wed Sep 26 05:11:19 2007 map version is 1987557558
Wed Sep 26 05:11:19 2007 Found NIC 0 at index 3!
Wed Sep 26 05:11:19 2007 Found NIC 1 at index 2!
Wed Sep 26 05:11:19 2007 map seems OK
Wed Sep 26 05:11:19 2007 Routing took 0 seconds
Thu Sep 27 11:45:37 2007 6 hosts, 9 nics, 6 xbars, 41 links
Thu Sep 27 11:45:37 2007 map version is 1987557559
Thu Sep 27 11:45:37 2007 Found NIC 0 at index 6!
Thu Sep 27 11:45:37 2007 Found NIC 1 at index 5!
Thu Sep 27 11:45:37 2007 map seems OK
Thu Sep 27 11:45:37 2007 Routing took 0 seconds
Thu Sep 27 11:51:02 2007 7 hosts, 11 nics, 6 xbars, 43 links
Thu Sep 27 11:51:02 2007 map version is 1987557560
Thu Sep 27 11:51:02 2007 Found NIC 0 at index 6!
Thu Sep 27 11:51:02 2007 Found NIC 1 at index 5!
Thu Sep 27 11:51:02 2007 map seems OK
Thu Sep 27 11:51:02 2007 Routing took 0 seconds
Fri Sep 28 13:27:10 2007 Requesting remap from indus4
(00:60:dd:47:b3:e8): verify failed x5p0, nic 1, port 0 route=-8 15 6
reply=-6 -15 8 , remote=ravi1 NIC
 0, p0 mac=00:60:dd:47:b3:bf
Fri Sep 28 13:27:24 2007 6 hosts, 8 nics, 6 xbars, 40 links
Fri Sep 28 13:27:24 2007 map version is 1987557561
Fri Sep 28 13:27:24 2007 Found NIC 0 at index 5!
Fri Sep 28 13:27:24 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map!
Fri Sep 28 13:27:24 2007 map seems OK
Fri Sep 28 13:27:24 2007 Routing took 0 seconds
Fri Sep 28 13:27:44 2007 7 hosts, 10 nics, 6 xbars, 42 links
Fri Sep 28 13:27:44 2007 map version is 1987557562
Fri Sep 28 13:27:44 2007 Found NIC 0 at index 7!
Fri Sep 28 13:27:44 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map!
Fri Sep 28 13:27:44 2007 map seems OK
Fri Sep 28 13:27:44 2007 Routing took 0 seconds

Do you have any suggestion or comments why this error appear and whats
the solution to this problem. I have checked community mailing list for
this problem and found few topics related to this, but could find any
solution. Any suggestion or comments will be highly appreciated.

The code that i m trying to run is given as follows:

#include <stdio.h>
#include "mpi.h"
int main(int argc, char **argv)
{
  int rank, size, tag, rc, i;
  MPI_Status status;
  char message[20];
  rc = MPI_Init(&argc, &argv);
  rc = MPI_Comm_size(MPI_COMM_WORLD, &size);
  rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  tag = 100;
  if(rank == 0) {
    strcpy(message, "Hello, world");
    for (i=1; i<size; i++)
      rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD);
  }
  else
    rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
&status);
  printf( "node %d : %.13s\n", rank,message);
  rc = MPI_Finalize();
  return 0;
}

Thanks.
Looking forward.
Best regards,
Hammad Siddiqi
Center for High Performance Scientific Computing
NUST Institute of Information Technology,
National University of Sciences and Technology,
Rawalpindi, Pakistan.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Reply via email to