Thanks Regards, Hammad Hammad Siddiqi wrote:
Dear Tim,Your and Tim Matox's suggestion yielded following results,*1. /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus1,indus2" -mca btl_base_debug 1000 ./hello*/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl mx,sm,self -host "indus1,indus2,indus3,indus4" -mca btl_base_debug 1000 ./hello[indus1:29331] select: initializing btl component mx [indus1:29331] select: init returned failure [indus1:29331] select: module mx unloaded [indus1:29331] select: initializing btl component sm [indus1:29331] select: init returned success [indus1:29331] select: initializing btl component self [indus1:29331] select: init returned success [indus3:13520] select: initializing btl component mx [indus3:13520] select: init returned failure [indus3:13520] select: module mx unloaded [indus3:13520] select: initializing btl component sm [indus3:13520] select: init returned success [indus3:13520] select: initializing btl component self [indus3:13520] select: init returned success [indus4:15486] select: initializing btl component mx [indus4:15486] select: init returned failure [indus4:15486] select: module mx unloaded [indus4:15486] select: initializing btl component sm [indus4:15486] select: init returned success [indus4:15486] select: initializing btl component self [indus4:15486] select: init returned success [indus2:11351] select: initializing btl component mx [indus2:11351] select: init returned failure [indus2:11351] select: module mx unloaded [indus2:11351] select: initializing btl component sm [indus2:11351] select: init returned success [indus2:11351] select: initializing btl component self [indus2:11351] select: init returned success -------------------------------------------------------------------------- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process canfail during MPI_INIT; some of which are due to configuration or environmentproblems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- Process 0.1.2 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process canfail during MPI_INIT--------------------------------------------------------------------------Process 0.1.3 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process canfail during MPI_INIT; some of which are due to configuration or environmentproblems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process canfail during MPI_INIT; some of which are due to configuration or environmentproblems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye)*2.1 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca mtl mx -host "indus1,indus2,indus3,indus4" ./hello*This command works fine*2.2 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca mtl mx -host "indus1,indus2,indus3,indus4" -mca pml cm ./hello*This command works fine.Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca pml cm -host "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"*, this command works fine. but *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca pml cm -host "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"* hangs for indefinite time.Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx,sm,self -host "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"* works fine*2.3 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx -host "indus1,indus2,indus3,indus4" -mca pml cm ./hello*This command hangs the machines for indefinite time.Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx -host "indus1,indus2,indus3,indus4" -mca pml cm -mca mtl_base_debug 1000 ./hello"* hangs thesystems for indefinite time.*2.4 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx,sm,self -host "indus1,indus2,indus3,indus4" -mca pml cm -mca mtl_base_debug 1000 ./hello*This command hangs the machines for indefinite time.Please notice that running more than four mpi processes hangs the machines. Any suggestion please.Thanks, Best Regards, Hammad Siddiqi Tim Prins wrote:I would reccommend trying a few things:1. Set some debugging flags and see if that helps. So, I would try something like:/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,self -host "indus1,indus2" -mca btl_base_debug 1000 ./helloThis will output information as each btl is loaded, and whether or not the load succeeds.2. Try running with the mx mtl instead of the btl: /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" ./hello Similarly, for debug output:/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" -mca mtl_base_debug 1000 ./helloLet me know if any of these work. Thanks, Tim On Saturday 29 September 2007 01:53:06 am Hammad Siddiqi wrote:Hi Terry, Thanks for replying. The following command is working fine: /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self -machinefile machines ./hello The contents of machines are: indus1 indus2 indus3 indus4 I have tried using np=2 over pairs of machines, but the problem is same. The errors that occur are given below with the command that I am trying. **Test 1** /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus1,indus2" ./hello -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -------------------------------------------------------------------------- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) **Test 2* */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus1,indus3" ./hello -------------------------------------------------------------------------- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) * *Test 3* */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus1,indus4" ./hello -------------------------------------------------------------------------- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) **Test4** /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus2,indus4" ./hello -------------------------------------------------------------------------- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) * *Test5* * /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus2,indus3" ./hello -------------------------------------------------------------------------- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) **Test 6* * /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus3,indus4" ./hello -------------------------------------------------------------------------- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) **END OF TESTS** There is one thing to note that when I run this command including -mca pml cm it works fine :S mpirun -np 4 -mca btl mx,sm,self -mca pml cm -machinefile machines ./hello Hello MPI! Process 4 of 1 on indus2 Hello MPI! Process 4 of 2 on indus3 Hello MPI! Process 4 of 3 on indus4 Hello MPI! Process 4 of 0 on indus1 To my knowledge this command is not using shared memory and is only using myrinet as interconnect. One more thing I cannot start more than 4 processes in this case, The mpirun process hangs. Any suggestions? Once again, thanks for your help. Regards, Hammad Terry Dontje wrote:Hi Hammad, It looks to me like none of the btl's could resolve a route between the node that process rank 0 is on to the other nodes. I would suggest trying np=2 over a couple pairs of machines to see if that works and you can truly be sure that only the first node is having this problem. It also might be helpful as a sanity check to use the tcp btl instead of mx and see if you get more traction with that. --td*From:* Hammad Siddiqi (/hammad.siddiqi_at_[hidden]/) *Date:* 2007-09-28 07:38:01 Hello, I am using Sun HPC Toolkit 7.0 to compile and run my C MPI programs. I have tested the myrinet installations using myricoms own test programs. The Myricom software stack I am using is MX and the vesrion is mx2g-1.1.7, mx_mapper is also used. We have 4 nodes having 8 dual core processors each (Sun Fire v890) and the operating system is Solaris 10 (SunOS indus1 5.10 Generic_125100-10 sun4u sparc SUNW,Sun-Fire-V890). The contents of machine file are: indus1 indus2 indus3 indus4 The output of *mx_info* on each node is given below =====*= indus1 *====== MX Version: 1.1.7rc3cvs1_1_fixes MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007 2 Myrinet boards installed. The MX driver is configured to support up to 4 instances and 1024 nodes. =================================================================== Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM Status: Running, P0: Link up MAC Address: 00:60:dd:47:ad:7c Product code: M3F-PCIXF-2 Part number: 09-03392 Serial number: 297218 Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured Mapped hosts: 10 ROUTE COUNT INDEX MAC ADDRESS HOST NAME P0 ----- ----------- --------- --- 0) 00:60:dd:47:ad:7c indus1:0 1,1 2) 00:60:dd:47:ad:68 indus4:0 8,3 3) 00:60:dd:47:b3:e8 indus4:1 7,3 4) 00:60:dd:47:b3:ab indus2:0 7,3 5) 00:60:dd:47:ad:66 indus3:0 8,3 6) 00:60:dd:47:ad:76 indus3:1 8,3 7) 00:60:dd:47:ad:77 jhelum1:0 8,3 8) 00:60:dd:47:b3:5a ravi2:0 8,3 9) 00:60:dd:47:ad:5f ravi2:1 1,1 10) 00:60:dd:47:b3:bf ravi1:0 8,3 =================================================================== ====== *indus2* ====== MX Version: 1.1.7rc3cvs1_1_fixes MX Build: @indus2:/opt/mx2g-1.1.7rc3 Thu May 31 11:24:03 PKT 2007 2 Myrinet boards installed. The MX driver is configured to support up to 4 instances and 1024 nodes. =================================================================== Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM Status: Running, P0: Link up MAC Address: 00:60:dd:47:b3:ab Product code: M3F-PCIXF-2 Part number: 09-03392 Serial number: 296636 Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured Mapped hosts: 10 ROUTE COUNT INDEX MAC ADDRESS HOST NAME P0 ----- ----------- --------- --- 0) 00:60:dd:47:b3:ab indus2:0 1,1 2) 00:60:dd:47:ad:68 indus4:0 1,1 3) 00:60:dd:47:b3:e8 indus4:1 8,3 4) 00:60:dd:47:ad:66 indus3:0 1,1 5) 00:60:dd:47:ad:76 indus3:1 7,3 6) 00:60:dd:47:ad:77 jhelum1:0 7,3 8) 00:60:dd:47:ad:7c indus1:0 8,3 9) 00:60:dd:47:b3:5a ravi2:0 8,3 10) 00:60:dd:47:ad:5f ravi2:1 8,3 11) 00:60:dd:47:b3:bf ravi1:0 7,3 =================================================================== Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM Status: Running, P0: Link down MAC Address: 00:60:dd:47:b3:c3 Product code: M3F-PCIXF-2 Part number: 09-03392 Serial number: 296612 Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured Mapped hosts: 10 ====== *indus3* ====== MX Version: 1.1.7rc3cvs1_1_fixes MX Build: @indus3:/opt/mx2g-1.1.7rc3 Thu May 31 11:29:03 PKT 2007 2 Myrinet boards installed. The MX driver is configured to support up to 4 instances and 1024 nodes. =================================================================== Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM Status: Running, P0: Link up MAC Address: 00:60:dd:47:ad:66 Product code: M3F-PCIXF-2 Part number: 09-03392 Serial number: 297240 Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured Mapped hosts: 10 ROUTE COUNT INDEX MAC ADDRESS HOST NAME P0 ----- ----------- --------- --- 0) 00:60:dd:47:ad:66 indus3:0 1,1 1) 00:60:dd:47:ad:76 indus3:1 8,3 2) 00:60:dd:47:ad:68 indus4:0 1,1 3) 00:60:dd:47:b3:e8 indus4:1 6,3 4) 00:60:dd:47:ad:77 jhelum1:0 8,3 5) 00:60:dd:47:b3:ab indus2:0 1,1 7) 00:60:dd:47:ad:7c indus1:0 8,3 8) 00:60:dd:47:b3:5a ravi2:0 8,3 9) 00:60:dd:47:ad:5f ravi2:1 7,3 10) 00:60:dd:47:b3:bf ravi1:0 8,3 =================================================================== Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM Status: Running, P0: Link up MAC Address: 00:60:dd:47:ad:76 Product code: M3F-PCIXF-2 Part number: 09-03392 Serial number: 297224 Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured Mapped hosts: 10 ROUTE COUNT INDEX MAC ADDRESS HOST NAME P0 ----- ----------- --------- --- 0) 00:60:dd:47:ad:66 indus3:0 8,3 1) 00:60:dd:47:ad:76 indus3:1 1,1 2) 00:60:dd:47:ad:68 indus4:0 7,3 3) 00:60:dd:47:b3:e8 indus4:1 1,1 4) 00:60:dd:47:ad:77 jhelum1:0 1,1 5) 00:60:dd:47:b3:ab indus2:0 7,3 7) 00:60:dd:47:ad:7c indus1:0 8,3 8) 00:60:dd:47:b3:5a ravi2:0 6,3 9) 00:60:dd:47:ad:5f ravi2:1 8,3 10) 00:60:dd:47:b3:bf ravi1:0 8,3 ====== *indus4* ====== MX Version: 1.1.7rc3cvs1_1_fixes MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007 2 Myrinet boards installed. The MX driver is configured to support up to 4 instances and 1024 nodes. =================================================================== Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM Status: Running, P0: Link up MAC Address: 00:60:dd:47:ad:68 Product code: M3F-PCIXF-2 Part number: 09-03392 Serial number: 297238 Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured Mapped hosts: 10 ROUTE COUNT INDEX MAC ADDRESS HOST NAME P0 ----- ----------- --------- --- 0) 00:60:dd:47:ad:68 indus4:0 1,1 1) 00:60:dd:47:b3:e8 indus4:1 7,3 2) 00:60:dd:47:ad:77 jhelum1:0 7,3 3) 00:60:dd:47:ad:66 indus3:0 1,1 4) 00:60:dd:47:ad:76 indus3:1 7,3 5) 00:60:dd:47:b3:ab indus2:0 1,1 7) 00:60:dd:47:ad:7c indus1:0 7,3 8) 00:60:dd:47:b3:5a ravi2:0 7,3 9) 00:60:dd:47:ad:5f ravi2:1 8,3 10) 00:60:dd:47:b3:bf ravi1:0 7,3 =================================================================== Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM Status: Running, P0: Link up MAC Address: 00:60:dd:47:b3:e8 Product code: M3F-PCIXF-2 Part number: 09-03392 Serial number: 296575 Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured Mapped hosts: 10 ROUTE COUNT INDEX MAC ADDRESS HOST NAME P0 ----- ----------- --------- --- 0) 00:60:dd:47:ad:68 indus4:0 6,3 1) 00:60:dd:47:b3:e8 indus4:1 1,1 2) 00:60:dd:47:ad:77 jhelum1:0 1,1 3) 00:60:dd:47:ad:66 indus3:0 8,3 4) 00:60:dd:47:ad:76 indus3:1 1,1 5) 00:60:dd:47:b3:ab indus2:0 8,3 7) 00:60:dd:47:ad:7c indus1:0 7,3 8) 00:60:dd:47:b3:5a ravi2:0 6,3 9) 00:60:dd:47:ad:5f ravi2:1 8,3 10) 00:60:dd:47:b3:bf ravi1:0 8,3 The output from *ompi_info* is: Open MPI: 1.2.1r14096-ct7b030r1838 Open MPI SVN revision: 0 Open RTE: 1.2.1r14096-ct7b030r1838 Open RTE SVN revision: 0 OPAL: 1.2.1r14096-ct7b030r1838 OPAL SVN revision: 0 Prefix: /opt/SUNWhpc/HPC7.0 Configured architecture: sparc-sun-solaris2.10 Configured by: root Configured on: Fri Mar 30 12:49:36 EDT 2007 Configure host: burpen-on10-0 Built by: root Built on: Fri Mar 30 13:10:46 EDT 2007 Built host: burpen-on10-0 C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: trivial C compiler: cc C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc C++ compiler: CC C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC Fortran77 compiler: f77 Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77 Fortran90 compiler: f95 Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95 C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: yes Thread support: no Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: yes MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.1) MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1) MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1) MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1) MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2.1) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1) MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1) MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2.1) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1) MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1) MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1) MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1) MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1) MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1) MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.1) MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1) MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1) MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.1) MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1) MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1) MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.1) MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1) MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1) MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1) MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.1) When I try to run a simple hello world program by issuing following command: *mpirun -np 4 -mca btl mx,sm,self -machinefile machines ./hello *The following error appears: ------------------------------------------------------------------------ -- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. ------------------------------------------------------------------------ -- ------------------------------------------------------------------------ -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) ------------------------------------------------------------------------ -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) ------------------------------------------------------------------------ -- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. ------------------------------------------------------------------------ -- ------------------------------------------------------------------------ -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) ------------------------------------------------------------------------ -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) ------------------------------------------------------------------------ -- Process 0.1.3 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. ------------------------------------------------------------------------ -- ------------------------------------------------------------------------ -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) ------------------------------------------------------------------------ -- ------------------------------------------------------------------------ -- Process 0.1.2 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. ------------------------------------------------------------------------ -- ------------------------------------------------------------------------ -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-*** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) 12) instead of "Success" (0) ------------------------------------------------------------------------ -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) The output from more */var/run/fms/fma.log* Sat Sep 22 10:47:50 2007 NIC 0: M3F-PCIXF-2 s/n=297218 1 ports, speed=2G Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:7c Sat Sep 22 10:47:50 2007 NIC 1: M3F-PCIXF-2 s/n=297248 1 ports, speed=2G Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:5e Sat Sep 22 10:47:50 2007 fms-1.2.1 fma starting Sat Sep 22 10:47:50 2007 Mapper was 00:00:00:00:00:00, l=0, is now 00:60:dd:47:ad:7c, l=1 Sat Sep 22 10:47:50 2007 Mapping fabric... Sat Sep 22 10:47:54 2007 Mapper was 00:60:dd:47:ad:7c, l=1, is now 00:60:dd:47:b3:e8, l=1 Sat Sep 22 10:47:54 2007 Cancelling mapping Sat Sep 22 10:47:59 2007 5 hosts, 8 nics, 6 xbars, 40 links Sat Sep 22 10:47:59 2007 map version is 1987557551 Sat Sep 22 10:47:59 2007 Found NIC 0 at index 3! Sat Sep 22 10:47:59 2007 Found NIC 1 at index 2! Sat Sep 22 10:47:59 2007 map seems OK Sat Sep 22 10:47:59 2007 Routing took 0 seconds Mon Sep 24 14:26:46 2007 Requesting remap from indus4 (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:5a, lev=1, pkt_type=0 Mon Sep 24 14:26:51 2007 6 hosts, 10 nics, 6 xbars, 42 links Mon Sep 24 14:26:51 2007 map version is 1987557552 Mon Sep 24 14:26:51 2007 Found NIC 0 at index 3! Mon Sep 24 14:26:51 2007 Found NIC 1 at index 2! Mon Sep 24 14:26:51 2007 map seems OK Mon Sep 24 14:26:51 2007 Routing took 0 seconds Mon Sep 24 14:35:17 2007 Requesting remap from indus4 (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:bf, lev=1, pkt_type=0 Mon Sep 24 14:35:19 2007 7 hosts, 11 nics, 6 xbars, 43 links Mon Sep 24 14:35:19 2007 map version is 1987557553 Mon Sep 24 14:35:19 2007 Found NIC 0 at index 5! Mon Sep 24 14:35:19 2007 Found NIC 1 at index 4! Mon Sep 24 14:35:19 2007 map seems OK Mon Sep 24 14:35:19 2007 Routing took 0 seconds Tue Sep 25 21:47:52 2007 6 hosts, 9 nics, 6 xbars, 41 links Tue Sep 25 21:47:52 2007 map version is 1987557554 Tue Sep 25 21:47:52 2007 Found NIC 0 at index 3! Tue Sep 25 21:47:52 2007 Found NIC 1 at index 2! Tue Sep 25 21:47:52 2007 map seems OK Tue Sep 25 21:47:52 2007 Routing took 0 seconds Tue Sep 25 21:52:02 2007 Requesting remap from indus4 (00:60:dd:47:b3:e8): empty port x0p15 is no longer empty Tue Sep 25 21:52:07 2007 6 hosts, 10 nics, 6 xbars, 42 links Tue Sep 25 21:52:07 2007 map version is 1987557555 Tue Sep 25 21:52:07 2007 Found NIC 0 at index 4! Tue Sep 25 21:52:07 2007 Found NIC 1 at index 3! Tue Sep 25 21:52:07 2007 map seems OK Tue Sep 25 21:52:07 2007 Routing took 0 seconds Tue Sep 25 21:52:23 2007 7 hosts, 11 nics, 6 xbars, 43 links Tue Sep 25 21:52:23 2007 map version is 1987557556 Tue Sep 25 21:52:23 2007 Found NIC 0 at index 6! Tue Sep 25 21:52:23 2007 Found NIC 1 at index 5! Tue Sep 25 21:52:23 2007 map seems OK Tue Sep 25 21:52:23 2007 Routing took 0 seconds Wed Sep 26 05:07:01 2007 Requesting remap from indus4 (00:60:dd:47:b3:e8): verify failed x1p2, nic 0, port 0 route=-9 4 10 reply=-10 -4 9 , remote=ravi2 NIC 1, p0 mac=00:60:dd:47:ad:5f Wed Sep 26 05:07:06 2007 6 hosts, 9 nics, 6 xbars, 41 links Wed Sep 26 05:07:06 2007 map version is 1987557557 Wed Sep 26 05:07:06 2007 Found NIC 0 at index 3! Wed Sep 26 05:07:06 2007 Found NIC 1 at index 2! Wed Sep 26 05:07:06 2007 map seems OK Wed Sep 26 05:07:06 2007 Routing took 0 seconds Wed Sep 26 05:11:19 2007 7 hosts, 11 nics, 6 xbars, 43 links Wed Sep 26 05:11:19 2007 map version is 1987557558 Wed Sep 26 05:11:19 2007 Found NIC 0 at index 3! Wed Sep 26 05:11:19 2007 Found NIC 1 at index 2! Wed Sep 26 05:11:19 2007 map seems OK Wed Sep 26 05:11:19 2007 Routing took 0 seconds Thu Sep 27 11:45:37 2007 6 hosts, 9 nics, 6 xbars, 41 links Thu Sep 27 11:45:37 2007 map version is 1987557559 Thu Sep 27 11:45:37 2007 Found NIC 0 at index 6! Thu Sep 27 11:45:37 2007 Found NIC 1 at index 5! Thu Sep 27 11:45:37 2007 map seems OK Thu Sep 27 11:45:37 2007 Routing took 0 seconds Thu Sep 27 11:51:02 2007 7 hosts, 11 nics, 6 xbars, 43 links Thu Sep 27 11:51:02 2007 map version is 1987557560 Thu Sep 27 11:51:02 2007 Found NIC 0 at index 6! Thu Sep 27 11:51:02 2007 Found NIC 1 at index 5! Thu Sep 27 11:51:02 2007 map seems OK Thu Sep 27 11:51:02 2007 Routing took 0 seconds Fri Sep 28 13:27:10 2007 Requesting remap from indus4 (00:60:dd:47:b3:e8): verify failed x5p0, nic 1, port 0 route=-8 15 6 reply=-6 -15 8 , remote=ravi1 NIC 0, p0 mac=00:60:dd:47:b3:bf Fri Sep 28 13:27:24 2007 6 hosts, 8 nics, 6 xbars, 40 links Fri Sep 28 13:27:24 2007 map version is 1987557561 Fri Sep 28 13:27:24 2007 Found NIC 0 at index 5! Fri Sep 28 13:27:24 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map! Fri Sep 28 13:27:24 2007 map seems OK Fri Sep 28 13:27:24 2007 Routing took 0 seconds Fri Sep 28 13:27:44 2007 7 hosts, 10 nics, 6 xbars, 42 links Fri Sep 28 13:27:44 2007 map version is 1987557562 Fri Sep 28 13:27:44 2007 Found NIC 0 at index 7! Fri Sep 28 13:27:44 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map! Fri Sep 28 13:27:44 2007 map seems OK Fri Sep 28 13:27:44 2007 Routing took 0 seconds Do you have any suggestion or comments why this error appear and whats the solution to this problem. I have checked community mailing list for this problem and found few topics related to this, but could find any solution. Any suggestion or comments will be highly appreciated. The code that i m trying to run is given as follows: #include <stdio.h> #include "mpi.h" int main(int argc, char **argv) { int rank, size, tag, rc, i; MPI_Status status; char message[20]; rc = MPI_Init(&argc, &argv); rc = MPI_Comm_size(MPI_COMM_WORLD, &size); rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank); tag = 100; if(rank == 0) { strcpy(message, "Hello, world"); for (i=1; i<size; i++) rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD); } else rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status); printf( "node %d : %.13s\n", rank,message); rc = MPI_Finalize(); return 0; } Thanks. Looking forward. Best regards, Hammad Siddiqi Center for High Performance Scientific Computing NUST Institute of Information Technology, National University of Sciences and Technology, Rawalpindi, Pakistan._______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users-- This message has been scanned for viruses and dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is believed to be clean. ------------------------------------------------------------------------ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.