FWIW: /usr/include/infiniband/verbs.h is the normal location for verbs.h. Don't add --with-verbs=/usr/include/infinband; it won't work.
Please send all the information listed here and we can have a look at your logs: http://www.open-mpi.org/community/help/ On Mar 2, 2014, at 7:16 AM, Ralph Castain <r...@open-mpi.org> wrote: > It should have been looking in the same place - check to see where you > installed the inifiniband support. Is "verbs.h" under your /usr/include? > > In looking at the code, the 1.6 series searched for verbs.h in > /usr/include/infiniband. The 1.7 series also does (though it doesn't look > quite right to me), but it wouldn't hurt to add it yourself > > --with-verbs=/usr/include/infiniband --with-verbs-libdir=/usr/lib64/infiniband > > or something like that > > > On Mar 1, 2014, at 11:56 PM, Beichuan Yan <beichuan....@colorado.edu> wrote: > >> Ralph and Gus, >> >> 1. Thank you for your suggestion. I built Open MPI 1.6.5 with the following >> command: >> ./configure >> --prefix=/work4/projects/openmpi/openmpi-1.6.5-gcc-compilers-4.7.3 >> --with-tm=/opt/pbs/default --with-openib= --with-openib-libdir=/usr/lib64 >> >> In my job script, I need to specify the IB subnet like this: >> TCP="--mca btl_tcp_if_include 10.148.0.0/16" >> mpirun $TCP -np 64 -hostfile $PBS_NODEFILE ./paraEllip3d input.txt >> >> Then my job can get initialized and run correctly each time! >> >> 2. However, to build Open MPI 1.7.4 with another command (in order to >> test/compare shared-memory performance of Open MPI): >> ./configure >> --prefix=/work4/projects/openmpi/openmpi-1.7.4-gcc-compilers-4.7.3 >> --with-tm=/opt/pbs/default --with-verbs= --with-verbs-libdir=/usr/lib64 >> >> It gets error as follows: >> ============================================================================ >> == Modular Component Architecture (MCA) setup >> ============================================================================ >> checking for subdir args... >> '--prefix=/work4/projects/openmpi/openmpi-1.7.4-gcc-compilers-4.7.3' >> '--with-tm=/opt/pbs/default' '--with-verbs=' >> '--with-verbs-libdir=/usr/lib64' 'CC=gcc' 'CXX=g++' >> checking --with-verbs value... simple ok (unspecified) >> checking --with-verbs-libdir value... sanity check ok (/usr/lib64) >> configure: WARNING: Could not find verbs.h in the usual locations under >> configure: error: Cannot continue >> >> Our system is Red Hat 6.4. Do we need to install more packages of >> Infiniband? Can you please advise? >> >> Thanks, >> Beichuan Yan >> >> >> -----Original Message----- >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa >> Sent: Friday, February 28, 2014 15:59 >> To: Open MPI Users >> Subject: Re: [OMPI users] OpenMPI job initializing problem >> >> HI Beichuan >> >> To add to what Ralph said, >> the RHEL OpenMPI package probably wasn't built with with PBS Pro support >> either. >> Besides, OMPI 1.5.4 (RHEL version) is old. >> >> ** >> >> You will save yourself time and grief if you read the installation FAQs, >> before you install from the source tarball: >> >> http://www.open-mpi.org/faq/?category=building >> >> However, as Ralph said, that is your best bet, and it is quite easy to get >> right. >> >> >> See this FAQ on how to build with PBS Pro support: >> >> http://www.open-mpi.org/faq/?category=building#build-rte-tm >> >> And this one on how to build with Infiniband support: >> >> http://www.open-mpi.org/faq/?category=building#build-p2p >> >> Here is how to select the installation directory (--prefix): >> >> http://www.open-mpi.org/faq/?category=building#easy-build >> >> Here is how to select the compilers (gcc,g++, and gfortran are fine): >> >> http://www.open-mpi.org/faq/?category=building#build-compilers >> >> I hope this helps, >> Gus Correa >> >> On 02/28/2014 12:36 PM, Ralph Castain wrote: >>> Almost certainly, the redhat package wasn't built with matching >>> infiniband support and so we aren't picking it up. I'd suggest >>> downloading the latest 1.7.4 or 1.7.5 nightly tarball, or even the >>> latest 1.6 tarball if you want the stable release, and build it >>> yourself so you *know* it was built for your system. >>> >>> >>> On Feb 28, 2014, at 9:20 AM, Beichuan Yan <beichuan....@colorado.edu >>> <mailto:beichuan....@colorado.edu>> wrote: >>> >>>> Hi there, >>>> I am running jobs on clusters with Infiniband connection. They >>>> installed OpenMPI v1.5.4 via REDHAT 6 yum package). My problem is >>>> that although my jobs gets queued and started by PBS PRO quickly, >>>> most of the time they don't really run (occasionally they really run) >>>> and give error info like this (even though there are a lot of CPU/IB >>>> resource >>>> available): >>>> [r2i6n7][[25564,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_com >>>> plete_connect] >>>> connect() to 192.168.159.156 failed: Connection refused (111) And >>>> even though when a job gets started and runs well, it prompts this >>>> error: >>>> --------------------------------------------------------------------- >>>> ----- >>>> WARNING: There was an error initializing an OpenFabrics device. >>>> Local host: r1i2n6 >>>> Local device: mlx4_0 >>>> --------------------------------------------------------------------- >>>> ----- 1. Here is the info from one of the compute nodes: >>>> -bash-4.1$ /sbin/ifconfig >>>> eth0 Link encap:Ethernet HWaddr 8C:89:A5:E3:D2:96 inet >>>> addr:192.168.159.205 Bcast:192.168.159.255 Mask:255.255.255.0 >>>> inet6 addr: fe80::8e89:a5ff:fee3:d296/64 Scope:Link UP BROADCAST >>>> RUNNING MULTICAST MTU:1500 Metric:1 RX packets:48879864 errors:0 >>>> dropped:0 overruns:17 frame:0 TX packets:39286060 errors:0 dropped:0 >>>> overruns:0 carrier:0 >>>> collisions:0 txqueuelen:1000 >>>> RX bytes:54771093645 (51.0 GiB) TX bytes:37512462596 (34.9 GiB) >>>> Memory:dfc00000-dfc20000 >>>> Ifconfig uses the ioctl access method to get the full address >>>> information, which limits hardware addresses to 8 bytes. >>>> Because Infiniband address has 20 bytes, only the first 8 bytes are >>>> displayed correctly. >>>> Ifconfig is obsolete! For replacement check ip. >>>> ib0 Link encap:InfiniBand HWaddr >>>> 80:00:00:48:FE:C0:00:00:00:00:00:00:00:00:00:00:00:00:00:00 >>>> inet addr:10.148.0.114 Bcast:10.148.255.255 Mask:255.255.0.0 >>>> inet6 addr: fe80::202:c903:fb:3489/64 Scope:Link UP BROADCAST RUNNING >>>> MULTICAST MTU:65520 Metric:1 RX packets:43807414 errors:0 dropped:0 >>>> overruns:0 frame:0 TX packets:10534050 errors:0 dropped:24 overruns:0 >>>> carrier:0 >>>> collisions:0 txqueuelen:256 >>>> RX bytes:47824448125 (44.5 GiB) TX bytes:44764010514 (41.6 GiB) lo >>>> Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 >>>> inet6 addr: ::1/128 Scope:Host >>>> UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:17292 errors:0 >>>> dropped:0 overruns:0 frame:0 TX packets:17292 errors:0 dropped:0 >>>> overruns:0 carrier:0 >>>> collisions:0 txqueuelen:0 >>>> RX bytes:1492453 (1.4 MiB) TX bytes:1492453 (1.4 MiB) -bash-4.1$ >>>> chkconfig --list iptables iptables 0:off 1:off 2:on 3:on 4:on 5:on >>>> 6:off 2. I tried various parameters below but none of them can assure >>>> my jobs get initialized and run: >>>> #TCP="--mca btl ^tcp" >>>> #TCP="--mca btl self,openib" >>>> #TCP="--mca btl_tcp_if_exclude lo" >>>> #TCP="--mca btl_tcp_if_include eth0" >>>> #TCP="--mca btl_tcp_if_include eth0, ib0" >>>> #TCP="--mca btl_tcp_if_exclude 192.168.0.0/24,127.0.0.1/8 --mca >>>> oob_tcp_if_exclude 192.168.0.0/24,127.0.0.1/8" >>>> #TCP="--mca btl_tcp_if_include 10.148.0.0/16" >>>> mpirun $TCP -hostfile $PBS_NODEFILE -np 8 ./paraEllip3d input.txt 3. >>>> Then I turned to Intel MPI, which surprisingly starts and runs my job >>>> correctly each time (though it is a little slower than OpenMPI, maybe >>>> 15% slower, but it works each time). >>>> Can you please advise? Many thanks. >>>> Sincerely, >>>> Beichuan Yan >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/