FWIW: /usr/include/infiniband/verbs.h is the normal location for verbs.h.  
Don't add --with-verbs=/usr/include/infinband; it won't work.

Please send all the information listed here and we can have a look at your logs:

    http://www.open-mpi.org/community/help/


On Mar 2, 2014, at 7:16 AM, Ralph Castain <r...@open-mpi.org> wrote:

> It should have been looking in the same place - check to see where you 
> installed the inifiniband support. Is "verbs.h" under your /usr/include?
> 
> In looking at the code, the 1.6 series searched for verbs.h in 
> /usr/include/infiniband. The 1.7 series also does (though it doesn't look 
> quite right to me), but it wouldn't hurt to add it yourself
> 
> --with-verbs=/usr/include/infiniband --with-verbs-libdir=/usr/lib64/infiniband
> 
> or something like that
> 
> 
> On Mar 1, 2014, at 11:56 PM, Beichuan Yan <beichuan....@colorado.edu> wrote:
> 
>> Ralph and Gus,
>> 
>> 1. Thank you for your suggestion. I built Open MPI 1.6.5 with the following 
>> command: 
>> ./configure 
>> --prefix=/work4/projects/openmpi/openmpi-1.6.5-gcc-compilers-4.7.3 
>> --with-tm=/opt/pbs/default --with-openib=  --with-openib-libdir=/usr/lib64
>> 
>> In my job script, I need to specify the IB subnet like this:
>> TCP="--mca btl_tcp_if_include 10.148.0.0/16"
>> mpirun $TCP -np 64 -hostfile $PBS_NODEFILE ./paraEllip3d input.txt
>> 
>> Then my job can get initialized and run correctly each time!
>> 
>> 2. However, to build Open MPI 1.7.4 with another command (in order to 
>> test/compare shared-memory performance of Open MPI):
>> ./configure 
>> --prefix=/work4/projects/openmpi/openmpi-1.7.4-gcc-compilers-4.7.3 
>> --with-tm=/opt/pbs/default --with-verbs=  --with-verbs-libdir=/usr/lib64
>> 
>> It gets error as follows:
>> ============================================================================
>> == Modular Component Architecture (MCA) setup
>> ============================================================================
>> checking for subdir args...  
>> '--prefix=/work4/projects/openmpi/openmpi-1.7.4-gcc-compilers-4.7.3' 
>> '--with-tm=/opt/pbs/default' '--with-verbs=' 
>> '--with-verbs-libdir=/usr/lib64' 'CC=gcc' 'CXX=g++'
>> checking --with-verbs value... simple ok (unspecified)
>> checking --with-verbs-libdir value... sanity check ok (/usr/lib64)
>> configure: WARNING: Could not find verbs.h in the usual locations under
>> configure: error: Cannot continue
>> 
>> Our system is Red Hat 6.4. Do we need to install more packages of 
>> Infiniband? Can you please advise?
>> 
>> Thanks,
>> Beichuan Yan
>> 
>> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa
>> Sent: Friday, February 28, 2014 15:59
>> To: Open MPI Users
>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>> 
>> HI Beichuan
>> 
>> To add to what Ralph said,
>> the RHEL OpenMPI package probably wasn't built with with PBS Pro support 
>> either.
>> Besides, OMPI 1.5.4 (RHEL version) is old.
>> 
>> **
>> 
>> You will save yourself time and grief if you read the installation FAQs, 
>> before you install from the source tarball:
>> 
>> http://www.open-mpi.org/faq/?category=building
>> 
>> However, as Ralph said, that is your best bet, and it is quite easy to get 
>> right.
>> 
>> 
>> See this FAQ on how to build with PBS Pro support:
>> 
>> http://www.open-mpi.org/faq/?category=building#build-rte-tm
>> 
>> And this one on how to build with Infiniband support:
>> 
>> http://www.open-mpi.org/faq/?category=building#build-p2p
>> 
>> Here is how to select the installation directory (--prefix):
>> 
>> http://www.open-mpi.org/faq/?category=building#easy-build
>> 
>> Here is how to select the compilers (gcc,g++, and gfortran are fine):
>> 
>> http://www.open-mpi.org/faq/?category=building#build-compilers
>> 
>> I hope this helps,
>> Gus Correa
>> 
>> On 02/28/2014 12:36 PM, Ralph Castain wrote:
>>> Almost certainly, the redhat package wasn't built with matching 
>>> infiniband support and so we aren't picking it up. I'd suggest 
>>> downloading the latest 1.7.4 or 1.7.5 nightly tarball, or even the 
>>> latest 1.6 tarball if you want the stable release, and build it 
>>> yourself so you *know* it was built for your system.
>>> 
>>> 
>>> On Feb 28, 2014, at 9:20 AM, Beichuan Yan <beichuan....@colorado.edu 
>>> <mailto:beichuan....@colorado.edu>> wrote:
>>> 
>>>> Hi there,
>>>> I am running jobs on clusters with Infiniband connection. They 
>>>> installed OpenMPI v1.5.4 via REDHAT 6 yum package). My problem is 
>>>> that although my jobs gets queued and started by PBS PRO quickly, 
>>>> most of the time they don't really run (occasionally they really run) 
>>>> and give error info like this (even though there are a lot of CPU/IB 
>>>> resource
>>>> available):
>>>> [r2i6n7][[25564,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_com
>>>> plete_connect]
>>>> connect() to 192.168.159.156 failed: Connection refused (111) And 
>>>> even though when a job gets started and runs well, it prompts this
>>>> error:
>>>> ---------------------------------------------------------------------
>>>> -----
>>>> WARNING: There was an error initializing an OpenFabrics device.
>>>> Local host: r1i2n6
>>>> Local device: mlx4_0
>>>> ---------------------------------------------------------------------
>>>> ----- 1. Here is the info from one of the compute nodes:
>>>> -bash-4.1$ /sbin/ifconfig
>>>> eth0 Link encap:Ethernet HWaddr 8C:89:A5:E3:D2:96 inet 
>>>> addr:192.168.159.205 Bcast:192.168.159.255 Mask:255.255.255.0
>>>> inet6 addr: fe80::8e89:a5ff:fee3:d296/64 Scope:Link UP BROADCAST 
>>>> RUNNING MULTICAST MTU:1500 Metric:1 RX packets:48879864 errors:0 
>>>> dropped:0 overruns:17 frame:0 TX packets:39286060 errors:0 dropped:0 
>>>> overruns:0 carrier:0
>>>> collisions:0 txqueuelen:1000
>>>> RX bytes:54771093645 (51.0 GiB) TX bytes:37512462596 (34.9 GiB)
>>>> Memory:dfc00000-dfc20000
>>>> Ifconfig uses the ioctl access method to get the full address 
>>>> information, which limits hardware addresses to 8 bytes.
>>>> Because Infiniband address has 20 bytes, only the first 8 bytes are 
>>>> displayed correctly.
>>>> Ifconfig is obsolete! For replacement check ip.
>>>> ib0 Link encap:InfiniBand HWaddr
>>>> 80:00:00:48:FE:C0:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>>>> inet addr:10.148.0.114 Bcast:10.148.255.255 Mask:255.255.0.0
>>>> inet6 addr: fe80::202:c903:fb:3489/64 Scope:Link UP BROADCAST RUNNING 
>>>> MULTICAST MTU:65520 Metric:1 RX packets:43807414 errors:0 dropped:0 
>>>> overruns:0 frame:0 TX packets:10534050 errors:0 dropped:24 overruns:0 
>>>> carrier:0
>>>> collisions:0 txqueuelen:256
>>>> RX bytes:47824448125 (44.5 GiB) TX bytes:44764010514 (41.6 GiB) lo 
>>>> Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0
>>>> inet6 addr: ::1/128 Scope:Host
>>>> UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:17292 errors:0 
>>>> dropped:0 overruns:0 frame:0 TX packets:17292 errors:0 dropped:0 
>>>> overruns:0 carrier:0
>>>> collisions:0 txqueuelen:0
>>>> RX bytes:1492453 (1.4 MiB) TX bytes:1492453 (1.4 MiB) -bash-4.1$ 
>>>> chkconfig --list iptables iptables 0:off 1:off 2:on 3:on 4:on 5:on 
>>>> 6:off 2. I tried various parameters below but none of them can assure 
>>>> my jobs get initialized and run:
>>>> #TCP="--mca btl ^tcp"
>>>> #TCP="--mca btl self,openib"
>>>> #TCP="--mca btl_tcp_if_exclude lo"
>>>> #TCP="--mca btl_tcp_if_include eth0"
>>>> #TCP="--mca btl_tcp_if_include eth0, ib0"
>>>> #TCP="--mca btl_tcp_if_exclude 192.168.0.0/24,127.0.0.1/8 --mca 
>>>> oob_tcp_if_exclude 192.168.0.0/24,127.0.0.1/8"
>>>> #TCP="--mca btl_tcp_if_include 10.148.0.0/16"
>>>> mpirun $TCP -hostfile $PBS_NODEFILE -np 8 ./paraEllip3d input.txt 3. 
>>>> Then I turned to Intel MPI, which surprisingly starts and runs my job 
>>>> correctly each time (though it is a little slower than OpenMPI, maybe 
>>>> 15% slower, but it works each time).
>>>> Can you please advise? Many thanks.
>>>> Sincerely,
>>>> Beichuan Yan
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> 
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to