Yes I had tried that initially it (apex-backpack) was trying to connect the Mac (10.11.14.203) at port number 4 which is too low. So that's why I made the port range higher..
> Have you tried running without limiting the port range? > > On Sep 24, 2009, at 12:39 PM, Pallab Datta wrote: > >> Hi All, >> >> Yes I can ping and ssh from apex-backpack to my Mac (fuji.local). >> I fixed the wireless broadcast to reflect the same on both ends >> (10.11.14.255) but still the problem persists. >> >> I have tried other wireless adapters as well. But no luck till far. >> Please let me know what can be done... >> regards, pallab >> >>> (putting this back on the list where others can reply as well, and if >>> we solve it, the solution will be google-ized) >>> >>> According to your debug output: >>> >>>>> [apex-backpack:31956] btl: tcp: attempting to connect() to address >>>>> 10.11.14.203 on port 9360 >>> >>> It *is* trying to connect to the right IP address. Are you able to >>> ping to .203 from apex-backpack? >>> >>> I also notice that you ethernet configuration does not exactly match >>> between linux and osx: >>> >>> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu >>> 1500 >>> inet 10.11.14.203 netmask 0xfffff000 broadcast 10.11.15.255 >>> >>> wlan0 Link encap:Ethernet HWaddr 00:21:79:c2:54:c7 >>> inet addr:10.11.14.205 Bcast:10.11.14.255 Mask: >>> 255.255.240.0 >>> >>> >>> On Sep 22, 2009, at 9:26 PM, Pallab Datta wrote: >>> >>>> There is no firewall running between the machines. I tried using the >>>> IP >>>> address instead of localhost but it gave me the same output. MPI is >>>> not >>>> even timing out..it keeps eternally hanging on..:( >>>> >>>> I have disabled the ethernet interface on the linux box, keeping >>>> only the >>>> wireless up. On the mac i only have the ethernet turned on. My mac >>>> is a 8 >>>> core mac pro. >>>> >>>> Please help me debug this.. >>>> thanks in advance, regards, >>>> pallab >>>> >>>> >>>>> (only replying to users list) >>>>> >>>>> Some suggestions: >>>>> >>>>> - MPI seems to startup but the additional TCP connections required >>>>> for >>>>> MPI connections seem to be failing / timing out / some other error. >>>>> - Are you running firewalls between your machines? If so, can you >>>>> disable them? >>>>> - I see that you're specifying "--mca btl_tcp_port_min_v4 36900" >>>>> but >>>>> one of the debug lines reads: >>>>>> [apex-backpack:31956] btl: tcp: attempting to connect() to address >>>>>> 10.11.14.203 on port 9360 >>>>> - Try not using the name "localhost", but rather the IP address of >>>>> the >>>>> local machine >>>>> >>>>> >>>>> On Sep 22, 2009, at 5:27 PM, Pallab Datta wrote: >>>>> >>>>>> The following are the ifconfig for both the Mac and the Linux >>>>>> respectively: >>>>>> >>>>>> fuji:openmpi-1.3.3 pallabdatta$ ifconfig >>>>>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 >>>>>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 >>>>>> inet 127.0.0.1 netmask 0xff000000 >>>>>> inet6 ::1 prefixlen 128 >>>>>> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 >>>>>> stf0: flags=0<> mtu 1280 >>>>>> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu >>>>>> 1500 >>>>>> inet6 fe80::21f:5bff:fe3d:eaac%en0 prefixlen 64 scopeid 0x4 >>>>>> inet 10.11.14.203 netmask 0xfffff000 broadcast 10.11.15.255 >>>>>> ether 00:1f:5b:3d:ea:ac >>>>>> media: autoselect (100baseTX <full-duplex>) status: active >>>>>> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP >>>>>> <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP >>>>>> <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX >>>>>> <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX >>>>>> <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT >>>>>> <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> >>>>>> en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu >>>>>> 1500 >>>>>> ether 00:1f:5b:3d:ea:ad >>>>>> media: autoselect status: inactive >>>>>> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP >>>>>> <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP >>>>>> <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX >>>>>> <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX >>>>>> <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT >>>>>> <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> >>>>>> fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu >>>>>> 4078 >>>>>> lladdr 00:22:41:ff:fe:ed:7d:a8 >>>>>> media: autoselect <full-duplex> status: inactive >>>>>> supported media: autoselect <full-duplex> >>>>>> >>>>>> >>>>>> LINUX: >>>>>> ==== >>>>>> pallabdatta@apex-backpack:~/backpack/src$ ifconfig >>>>>> lo Link encap:Local Loopback >>>>>> inet addr:127.0.0.1 Mask:255.0.0.0 >>>>>> inet6 addr: ::1/128 Scope:Host >>>>>> UP LOOPBACK RUNNING MTU:16436 Metric:1 >>>>>> RX packets:116 errors:0 dropped:0 overruns:0 frame:0 >>>>>> TX packets:116 errors:0 dropped:0 overruns:0 carrier:0 >>>>>> collisions:0 txqueuelen:0 >>>>>> RX bytes:11788 (11.7 KB) TX bytes:11788 (11.7 KB) >>>>>> >>>>>> wlan0 Link encap:Ethernet HWaddr 00:21:79:c2:54:c7 >>>>>> inet addr:10.11.14.205 Bcast:10.11.14.255 Mask: >>>>>> 255.255.240.0 >>>>>> inet6 addr: fe80::221:79ff:fec2:54c7/64 Scope:Link >>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>>> RX packets:72531 errors:0 dropped:0 overruns:0 frame:0 >>>>>> TX packets:28894 errors:0 dropped:0 overruns:0 carrier:0 >>>>>> collisions:0 txqueuelen:1000 >>>>>> RX bytes:5459312 (5.4 MB) TX bytes:7264193 (7.2 MB) >>>>>> >>>>>> wmaster0 Link encap:UNSPEC HWaddr >>>>>> 00-21-79-C2-54-C7-34-63-00-00-00-00-00-00-00-00 >>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>>> collisions:0 txqueuelen:1000 >>>>>> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) >>>>>> >>>>>> The mac is a Two 2.26GHz Quad-Core Intel Xeon Mac Pro and the >>>>>> Linux >>>>>> Box is >>>>>> Ubuntu Server Edition 9.04. The Mac has the ethernet interface to >>>>>> connect >>>>>> to the network and the linux box connects via a wireless adapter >>>>>> (IOGEAR). >>>>>> >>>>>> Please help me any way I can fix this issue. It really needs to >>>>>> work >>>>>> for >>>>>> our project. >>>>>> thanks in advance, >>>>>> regards, >>>>>> pallab >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> My other concern was the following but I am not sure it applies >>>>>>> here. >>>>>>> If you have multiple interfaces on the node, and they are on the >>>>>>> same >>>>>>> subnet, then you cannot actually select what IP address to go out >>>>>>> of. >>>>>>> You can only select the IP address you want to connect to. In >>>>>>> these >>>>>>> cases, I have seen a hang because we think we are selecting an IP >>>>>>> address to go out of, but it actually goes out the other one. >>>>>>> Perhaps you can send the User's list the output from "ifconfig" >>>>>>> on >>>>>>> each >>>>>>> of the machines which would show all the interfaces. You need to >>>>>>> get the >>>>>>> right arguments for ifconfig depending on the OS you are running >>>>>>> on. >>>>>>> >>>>>>> One thought is make sure the ethernet interface is marked down on >>>>>>> both >>>>>>> boxes if that is possible. >>>>>>> >>>>>>> Pallab Datta wrote: >>>>>>>> Any suggestions on to how to debug this further..?? >>>>>>>> do you think I need to enable any other option besides >>>>>>>> heterogeneous at >>>>>>>> the configure proompt.? >>>>>>>> >>>>>>>> >>>>>>>>> The -enable-heterogeneous should do the trick. And to answer >>>>>>>>> the >>>>>>>>> previous question, yes, put both of the interfaces in the >>>>>>>>> include >>>>>>>>> list. >>>>>>>>> >>>>>>>>> --mca btl_tcp_if_include en0,wlan0 >>>>>>>>> >>>>>>>>> If that does not work, then I may have one other thought why it >>>>>>>>> might >>>>>>>>> not work although perhaps not a solution. >>>>>>>>> >>>>>>>>> Rolf >>>>>>>>> >>>>>>>>> Pallab Datta wrote: >>>>>>>>> >>>>>>>>>> Hi Rolf, >>>>>>>>>> >>>>>>>>>> Do i need to configure openmpi with some specific options >>>>>>>>>> apart >>>>>>>>>> from >>>>>>>>>> --enable-heterogeneous..? >>>>>>>>>> I am currently using >>>>>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous >>>>>>>>>> --disable-static >>>>>>>>>> --enable-shared --enable-debug >>>>>>>>>> >>>>>>>>>> on both ends...is the above correct..?! Please let me know. >>>>>>>>>> thanks and regards, >>>>>>>>>> pallab >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hi: >>>>>>>>>>> I assume if you wait several minutes than your program will >>>>>>>>>>> actually >>>>>>>>>>> time out, yes? I guess I have two suggestions. First, can >>>>>>>>>>> you >>>>>>>>>>> run a >>>>>>>>>>> non-MPI job using the wireless? Something like hostname? >>>>>>>>>>> Secondly, >>>>>>>>>>> you >>>>>>>>>>> may want to specify the specific interfaces you want it to >>>>>>>>>>> use >>>>>>>>>>> on the >>>>>>>>>>> two machines. You can do that via the "--mca >>>>>>>>>>> btl_tcp_if_include" >>>>>>>>>>> run-time parameter. Just list the ones that you expect it to >>>>>>>>>>> use. >>>>>>>>>>> >>>>>>>>>>> Also, this is not right - "--mca OMPI_mca_mpi_preconnect_all >>>>>>>>>>> 1" It >>>>>>>>>>> should be --mca mpi_preconnect_mpi 1 if you want to do the >>>>>>>>>>> connection >>>>>>>>>>> during MPI_Init. >>>>>>>>>>> >>>>>>>>>>> Rolf >>>>>>>>>>> >>>>>>>>>>> Pallab Datta wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> The following is the error dump >>>>>>>>>>>> >>>>>>>>>>>> fuji:src pallabdatta$ /usr/local/bin/mpirun --mca >>>>>>>>>>>> btl_tcp_port_min_v4 >>>>>>>>>>>> 36900 -mca btl_tcp_port_range_v4 32 --mca btl_base_verbose >>>>>>>>>>>> 30 >>>>>>>>>>>> --mca >>>>>>>>>>>> btl >>>>>>>>>>>> tcp,self --mca OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero >>>>>>>>>>>> -H >>>>>>>>>>>> localhost,10.11.14.205 /tmp/hello >>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: Looking for >>>>>>>>>>>> btl >>>>>>>>>>>> components >>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: opening btl >>>>>>>>>>>> components >>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: found loaded >>>>>>>>>>>> component >>>>>>>>>>>> self >>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component >>>>>>>>>>>> self >>>>>>>>>>>> has no >>>>>>>>>>>> register function >>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component >>>>>>>>>>>> self >>>>>>>>>>>> open >>>>>>>>>>>> function successful >>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: found loaded >>>>>>>>>>>> component >>>>>>>>>>>> tcp >>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component tcp >>>>>>>>>>>> has no >>>>>>>>>>>> register function >>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component tcp >>>>>>>>>>>> open >>>>>>>>>>>> function >>>>>>>>>>>> successful >>>>>>>>>>>> [fuji.local:01316] select: initializing btl component self >>>>>>>>>>>> [fuji.local:01316] select: init of component self returned >>>>>>>>>>>> success >>>>>>>>>>>> [fuji.local:01316] select: initializing btl component tcp >>>>>>>>>>>> [fuji.local:01316] select: init of component tcp returned >>>>>>>>>>>> success >>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: Looking >>>>>>>>>>>> for >>>>>>>>>>>> btl >>>>>>>>>>>> components >>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: opening >>>>>>>>>>>> btl >>>>>>>>>>>> components >>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: found >>>>>>>>>>>> loaded >>>>>>>>>>>> component >>>>>>>>>>>> self >>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component >>>>>>>>>>>> self has >>>>>>>>>>>> no >>>>>>>>>>>> register function >>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component >>>>>>>>>>>> self >>>>>>>>>>>> open >>>>>>>>>>>> function successful >>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: found >>>>>>>>>>>> loaded >>>>>>>>>>>> component >>>>>>>>>>>> tcp >>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component >>>>>>>>>>>> tcp has >>>>>>>>>>>> no >>>>>>>>>>>> register function >>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component >>>>>>>>>>>> tcp open >>>>>>>>>>>> function successful >>>>>>>>>>>> [apex-backpack:04753] select: initializing btl component >>>>>>>>>>>> self >>>>>>>>>>>> [apex-backpack:04753] select: init of component self >>>>>>>>>>>> returned >>>>>>>>>>>> success >>>>>>>>>>>> [apex-backpack:04753] select: initializing btl component tcp >>>>>>>>>>>> [apex-backpack:04753] select: init of component tcp returned >>>>>>>>>>>> success >>>>>>>>>>>> Process 0 on fuji.local out of 2 >>>>>>>>>>>> Process 1 on apex-backpack out of 2 >>>>>>>>>>>> [apex-backpack:04753] btl: tcp: attempting to connect() to >>>>>>>>>>>> address >>>>>>>>>>>> 10.11.14.203 on port 9360 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Hi >>>>>>>>>>>>> >>>>>>>>>>>>> I am trying to run open-mpi 1.3.3. between a linux box >>>>>>>>>>>>> running >>>>>>>>>>>>> ubuntu >>>>>>>>>>>>> server v.9.04 and a Macintosh. I have configured openmpi >>>>>>>>>>>>> with >>>>>>>>>>>>> the >>>>>>>>>>>>> following options.: >>>>>>>>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous >>>>>>>>>>>>> --disable-shared >>>>>>>>>>>>> --enable-static >>>>>>>>>>>>> >>>>>>>>>>>>> When both the machines are connected to the network via >>>>>>>>>>>>> ethernet >>>>>>>>>>>>> cables >>>>>>>>>>>>> openmpi works fine. >>>>>>>>>>>>> >>>>>>>>>>>>> But when I switch the linux box to a wireless adapter i can >>>>>>>>>>>>> reach >>>>>>>>>>>>> (ping) >>>>>>>>>>>>> the macintosh >>>>>>>>>>>>> but openmpi hangs on a hello world program. >>>>>>>>>>>>> >>>>>>>>>>>>> I ran : >>>>>>>>>>>>> >>>>>>>>>>>>> /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca >>>>>>>>>>>>> btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca >>>>>>>>>>>>> OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H >>>>>>>>>>>>> localhost,10.11.14.205 >>>>>>>>>>>>> /tmp/back >>>>>>>>>>>>> >>>>>>>>>>>>> it hangs on a send receive function between the two ends. >>>>>>>>>>>>> All >>>>>>>>>>>>> my >>>>>>>>>>>>> firewalls >>>>>>>>>>>>> are turned off at the macintosh end. PLEASE HELP ASAP> >>>>>>>>>>>>> regards, >>>>>>>>>>>>> pallab >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ========================= >>>>>>>>>>> rolf.vandeva...@sun.com >>>>>>>>>>> 781-442-3043 >>>>>>>>>>> ========================= >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> ========================= >>>>>>>>> rolf.vandeva...@sun.com >>>>>>>>> 781-442-3043 >>>>>>>>> ========================= >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> ========================= >>>>>>> rolf.vandeva...@sun.com >>>>>>> 781-442-3043 >>>>>>> ========================= >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>>> -- >>>>> Jeff Squyres >>>>> jsquy...@cisco.com >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >