The following are the ifconfig for both the Mac and the Linux respectively:
fuji:openmpi-1.3.3 pallabdatta$ ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0<> mtu 1280 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet6 fe80::21f:5bff:fe3d:eaac%en0 prefixlen 64 scopeid 0x4 inet 10.11.14.203 netmask 0xfffff000 broadcast 10.11.15.255 ether 00:1f:5b:3d:ea:ac media: autoselect (100baseTX <full-duplex>) status: active supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether 00:1f:5b:3d:ea:ad media: autoselect status: inactive supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078 lladdr 00:22:41:ff:fe:ed:7d:a8 media: autoselect <full-duplex> status: inactive supported media: autoselect <full-duplex> LINUX: ==== pallabdatta@apex-backpack:~/backpack/src$ ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:116 errors:0 dropped:0 overruns:0 frame:0 TX packets:116 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:11788 (11.7 KB) TX bytes:11788 (11.7 KB) wlan0 Link encap:Ethernet HWaddr 00:21:79:c2:54:c7 inet addr:10.11.14.205 Bcast:10.11.14.255 Mask:255.255.240.0 inet6 addr: fe80::221:79ff:fec2:54c7/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:72531 errors:0 dropped:0 overruns:0 frame:0 TX packets:28894 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5459312 (5.4 MB) TX bytes:7264193 (7.2 MB) wmaster0 Link encap:UNSPEC HWaddr 00-21-79-C2-54-C7-34-63-00-00-00-00-00-00-00-00 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) The mac is a Two 2.26GHz Quad-Core Intel Xeon Mac Pro and the Linux Box is Ubuntu Server Edition 9.04. The Mac has the ethernet interface to connect to the network and the linux box connects via a wireless adapter (IOGEAR). Please help me any way I can fix this issue. It really needs to work for our project. thanks in advance, regards, pallab > My other concern was the following but I am not sure it applies here. > If you have multiple interfaces on the node, and they are on the same > subnet, then you cannot actually select what IP address to go out of. > You can only select the IP address you want to connect to. In these > cases, I have seen a hang because we think we are selecting an IP > address to go out of, but it actually goes out the other one. > Perhaps you can send the User's list the output from "ifconfig" on each > of the machines which would show all the interfaces. You need to get the > right arguments for ifconfig depending on the OS you are running on. > > One thought is make sure the ethernet interface is marked down on both > boxes if that is possible. > > Pallab Datta wrote: >> Any suggestions on to how to debug this further..?? >> do you think I need to enable any other option besides heterogeneous at >> the configure proompt.? >> >> >>> The -enable-heterogeneous should do the trick. And to answer the >>> previous question, yes, put both of the interfaces in the include list. >>> >>> --mca btl_tcp_if_include en0,wlan0 >>> >>> If that does not work, then I may have one other thought why it might >>> not work although perhaps not a solution. >>> >>> Rolf >>> >>> Pallab Datta wrote: >>> >>>> Hi Rolf, >>>> >>>> Do i need to configure openmpi with some specific options apart from >>>> --enable-heterogeneous..? >>>> I am currently using >>>> ./configure --prefix=/usr/local/ --enable-heterogeneous >>>> --disable-static >>>> --enable-shared --enable-debug >>>> >>>> on both ends...is the above correct..?! Please let me know. >>>> thanks and regards, >>>> pallab >>>> >>>> >>>> >>>>> Hi: >>>>> I assume if you wait several minutes than your program will actually >>>>> time out, yes? I guess I have two suggestions. First, can you run a >>>>> non-MPI job using the wireless? Something like hostname? Secondly, >>>>> you >>>>> may want to specify the specific interfaces you want it to use on the >>>>> two machines. You can do that via the "--mca btl_tcp_if_include" >>>>> run-time parameter. Just list the ones that you expect it to use. >>>>> >>>>> Also, this is not right - "--mca OMPI_mca_mpi_preconnect_all 1" It >>>>> should be --mca mpi_preconnect_mpi 1 if you want to do the connection >>>>> during MPI_Init. >>>>> >>>>> Rolf >>>>> >>>>> Pallab Datta wrote: >>>>> >>>>> >>>>>> The following is the error dump >>>>>> >>>>>> fuji:src pallabdatta$ /usr/local/bin/mpirun --mca >>>>>> btl_tcp_port_min_v4 >>>>>> 36900 -mca btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca >>>>>> btl >>>>>> tcp,self --mca OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H >>>>>> localhost,10.11.14.205 /tmp/hello >>>>>> [fuji.local:01316] mca: base: components_open: Looking for btl >>>>>> components >>>>>> [fuji.local:01316] mca: base: components_open: opening btl >>>>>> components >>>>>> [fuji.local:01316] mca: base: components_open: found loaded >>>>>> component >>>>>> self >>>>>> [fuji.local:01316] mca: base: components_open: component self has no >>>>>> register function >>>>>> [fuji.local:01316] mca: base: components_open: component self open >>>>>> function successful >>>>>> [fuji.local:01316] mca: base: components_open: found loaded >>>>>> component >>>>>> tcp >>>>>> [fuji.local:01316] mca: base: components_open: component tcp has no >>>>>> register function >>>>>> [fuji.local:01316] mca: base: components_open: component tcp open >>>>>> function >>>>>> successful >>>>>> [fuji.local:01316] select: initializing btl component self >>>>>> [fuji.local:01316] select: init of component self returned success >>>>>> [fuji.local:01316] select: initializing btl component tcp >>>>>> [fuji.local:01316] select: init of component tcp returned success >>>>>> [apex-backpack:04753] mca: base: components_open: Looking for btl >>>>>> components >>>>>> [apex-backpack:04753] mca: base: components_open: opening btl >>>>>> components >>>>>> [apex-backpack:04753] mca: base: components_open: found loaded >>>>>> component >>>>>> self >>>>>> [apex-backpack:04753] mca: base: components_open: component self has >>>>>> no >>>>>> register function >>>>>> [apex-backpack:04753] mca: base: components_open: component self >>>>>> open >>>>>> function successful >>>>>> [apex-backpack:04753] mca: base: components_open: found loaded >>>>>> component >>>>>> tcp >>>>>> [apex-backpack:04753] mca: base: components_open: component tcp has >>>>>> no >>>>>> register function >>>>>> [apex-backpack:04753] mca: base: components_open: component tcp open >>>>>> function successful >>>>>> [apex-backpack:04753] select: initializing btl component self >>>>>> [apex-backpack:04753] select: init of component self returned >>>>>> success >>>>>> [apex-backpack:04753] select: initializing btl component tcp >>>>>> [apex-backpack:04753] select: init of component tcp returned success >>>>>> Process 0 on fuji.local out of 2 >>>>>> Process 1 on apex-backpack out of 2 >>>>>> [apex-backpack:04753] btl: tcp: attempting to connect() to address >>>>>> 10.11.14.203 on port 9360 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> I am trying to run open-mpi 1.3.3. between a linux box running >>>>>>> ubuntu >>>>>>> server v.9.04 and a Macintosh. I have configured openmpi with the >>>>>>> following options.: >>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous >>>>>>> --disable-shared >>>>>>> --enable-static >>>>>>> >>>>>>> When both the machines are connected to the network via ethernet >>>>>>> cables >>>>>>> openmpi works fine. >>>>>>> >>>>>>> But when I switch the linux box to a wireless adapter i can reach >>>>>>> (ping) >>>>>>> the macintosh >>>>>>> but openmpi hangs on a hello world program. >>>>>>> >>>>>>> I ran : >>>>>>> >>>>>>> /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca >>>>>>> btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca >>>>>>> OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H >>>>>>> localhost,10.11.14.205 >>>>>>> /tmp/back >>>>>>> >>>>>>> it hangs on a send receive function between the two ends. All my >>>>>>> firewalls >>>>>>> are turned off at the macintosh end. PLEASE HELP ASAP> >>>>>>> regards, >>>>>>> pallab >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> >>>>> ========================= >>>>> rolf.vandeva...@sun.com >>>>> 781-442-3043 >>>>> ========================= >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> >>> -- >>> >>> ========================= >>> rolf.vandeva...@sun.com >>> 781-442-3043 >>> ========================= >>> >>> >>> >> >> > > > -- > > ========================= > rolf.vandeva...@sun.com > 781-442-3043 > ========================= > >