[OMPI users] Compiling OpenMPI 1.2.4 with Topspin Infiniband support "IPO link: can not find -lvapi"
ompi_mpi_finalize.lo runtime/ompi_mpi_params.lo runtime/ompi_mpi_preconnect.lo win/win.lo datatype/libdatatype.la debuggers/libdebuggers.la mpi/c/libmpi_c.la mpi/c/profile/libmpi_c_pmpi.la mpi/f77/libmpi_f77_base.la mca/allocator/libmca_allocator.la mca/allocator/bucket/libmca_allocator_bucket.la mca/allocator/basic/libmca_allocator_basic.la mca/bml/libmca_bml.la mca/bml/r2/libmca_bml_r2.la mca/btl/libmca_btl.la mca/btl/tcp/libmca_btl_tcp.la mca/btl/mvapi/libmca_btl_mvapi.la mca/btl/sm/libmca_btl_sm.la mca/btl/self/libmca_btl_self.la mca/coll/libmca_coll.la mca/coll/tuned/libmca_coll_tuned.la mca/coll/sm/libmca_coll_sm.la mca/coll/self/libmca_coll_self.la mca/coll/basic/libmca_coll_basic.la mca/common/sm/libmca_common_sm.la mca/io/libmca_io.la mca/io/romio/libmca_io_romio.la mca/mpool/libmca_mpool.la mca/mpool/sm/libmca_mpool_sm.la mca/mpool/rdma/libmca_mpool_rdma.la mca/mtl/libmca_mtl.la mca/osc/libmca_osc.la mca/osc/pt2pt/libmca_osc_pt2pt.la mca/pml/libmca_pml.la mca/pml/ob1/libmca_pml_ob1.la mca/pml/cm/libmca_pml_cm.la mca/rcache/libmca_rcache.la mca/rcache/vma/libmca_rcache_vma.la mca/topo/libmca_topo.la mca/topo/unity/libmca_topo_unity.la /home/makeuser/tmp/openmpi/openmpi-1.2.4_64/openmpi-1.2.4/orte/libopen-r te.la -lnsl -lutil libtool: link: /share/apps/intel/cce/9.1.047/bin/icc -shared class/.libs/ompi_bitmap.o class/.libs/ompi_free_list.o class/.libs/ompi_pointer_array.o class/.libs/ompi_rb_tree.o class/.libs/ompi_seq_tracker.o attribute/.libs/attribute.o attribute/.libs/attribute_predefined.o communicator/.libs/comm_init.o communicator/.libs/comm.o communicator/.libs/comm_cid.o communicator/.libs/comm_dyn.o communicator/.libs/comm_publish.o errhandler/.libs/errhandler.o errhandler/.libs/errhandler_invoke.o errhandler/.libs/errhandler_predefined.o errhandler/.libs/errcode.o errhandler/.libs/errcode-internal.o file/.libs/file.o group/.libs/group.o group/.libs/group_init.o group/.libs/group_set_rank.o info/.libs/info.o op/.libs/op.o op/.libs/op_predefined.o proc/.libs/proc.o request/.libs/grequest.o request/.libs/request.o request/.libs/req_test.o request/.libs/req_wait.o runtime/.libs/ompi_mpi_abort.o runtime/.libs/ompi_mpi_init.o runtime/.libs/ompi_mpi_finalize.o runtime/.libs/ompi_mpi_params.o runtime/.libs/ompi_mpi_preconnect.o win/.libs/win.o -Wl,--whole-archive datatype/.libs/libdatatype.a debuggers/.libs/libdebuggers.a mpi/c/.libs/libmpi_c.a mpi/c/profile/.libs/libmpi_c_pmpi.a mpi/f77/.libs/libmpi_f77_base.a mca/allocator/.libs/libmca_allocator.a mca/allocator/bucket/.libs/libmca_allocator_bucket.a mca/allocator/basic/.libs/libmca_allocator_basic.a mca/bml/.libs/libmca_bml.a mca/bml/r2/.libs/libmca_bml_r2.a mca/btl/.libs/libmca_btl.a mca/btl/tcp/.libs/libmca_btl_tcp.a mca/btl/mvapi/.libs/libmca_btl_mvapi.a mca/btl/sm/.libs/libmca_btl_sm.a mca/btl/self/.libs/libmca_btl_self.a mca/coll/.libs/libmca_coll.a mca/coll/tuned/.libs/libmca_coll_tuned.a mca/coll/sm/.libs/libmca_coll_sm.a mca/coll/self/.libs/libmca_coll_self.a mca/coll/basic/.libs/libmca_coll_basic.a mca/common/sm/.libs/libmca_common_sm_noinst.a mca/io/.libs/libmca_io.a mca/io/romio/.libs/libmca_io_romio.a mca/mpool/.libs/libmca_mpool.a mca/mpool/sm/.libs/libmca_mpool_sm.a mca/mpool/rdma/.libs/libmca_mpool_rdma.a mca/mtl/.libs/libmca_mtl.a mca/osc/.libs/libmca_osc.a mca/osc/pt2pt/.libs/libmca_osc_pt2pt.a mca/pml/.libs/libmca_pml.a mca/pml/ob1/.libs/libmca_pml_ob1.a mca/pml/cm/.libs/libmca_pml_cm.a mca/rcache/.libs/libmca_rcache.a mca/rcache/vma/.libs/libmca_rcache_vma.a mca/topo/.libs/libmca_topo.a mca/topo/unity/.libs/libmca_topo_unity.a -Wl,--no-whole-archive -Wl,-rpath -Wl,/home/makeuser/tmp/openmpi/openmpi-1.2.4_64/openmpi-1.2.4/orte/.libs -Wl,-rpath -Wl,/home/makeuser/tmp/openmpi/openmpi-1.2.4_64/openmpi-1.2.4/opal/.libs -Wl,-rpath -Wl,/share/apps/openmpi/intel/openmpi-1.2.4-64/lib -L/home/makeuser/tmp/openmpi/openmpi-1.2.4_64/openmpi-1.2.4/opal/.libs -L/share/apps/intel/cce/9.1.047/lib -lvapi -lmosal -lrt /home/makeuser/tmp/openmpi/openmpi-1.2.4_64/openmpi-1.2.4/orte/.libs/lib open-rte.so /home/makeuser/tmp/openmpi/openmpi-1.2.4_64/openmpi-1.2.4/opal/.libs/lib open-pal.so -lnuma -ldl -lnsl -lutil -pthread -pthread -Wl,-soname -Wl,libmpi.so.0 -o .libs/libmpi.so.0.0.0 IPO link: can not find -lvapi icc: error: problem during multi-file optimization compilation (code 1) make[2]: *** [libmpi.la] Error 1 make[2]: Leaving directory `/home/makeuser/tmp/openmpi/openmpi-1.2.4_64/openmpi-1.2.4/ompi' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/makeuser/tmp/openmpi/openmpi-1.2.4_64/openmpi-1.2.4/ompi' make: *** [all-recursive] Error 1 - Mike Hanby Information Systems Specialist II School of Engineering Dean's Office University of Alabama at Birmingham
Re: [OMPI users] Compiling OpenMPI 1.2.4 with Topspin Infinibandsupport "IPO link: can not find -lvapi"
Let me look into that. The system is a Rocks cluster, so with any luck they'll have an updated 'roll' that will make it easy to update. Thanks for the link, Mike -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Monday, November 05, 2007 16:44 To: Open MPI Users Subject: Re: [OMPI users] Compiling OpenMPI 1.2.4 with Topspin Infinibandsupport "IPO link: can not find -lvapi" Is there any chance that you can upgrade to the OFED IB stack? Cisco is recommending OFED to all of its customers who are able to upgrade: http://www.open-mpi.org/faq/?category=openfabrics#vapi-support If you can't upgrade, we'll continue to diagnose (please see http://www.open-mpi.org/community/help/) , but I thought I'd at least ask... On Nov 5, 2007, at 5:39 PM, Mike Hanby wrote: > Howdy, > > I'm attempting to compile OpenMPI using Intel compilers (9.1.047) > with Topspin Infiniband support (on CentOS 4.4 64bit). > > Configuring: > > ./configure CC=icc CXX=icpc FC=ifort F77=ifort F90=ifort --with- > mvapi=/usr/local/topspin --with-mvapi-libdir=/usr/local/topspin/ > lib64 --enable-static --prefix=/share/apps/openmpi/intel/ > openmpi-1.2.4-64 > > make runs for 5 minutes or so and errors with: > > IPO link: can not find -lvapi > icc: error: problem during multi-file optimization compilation (code > 1) > make[2]: *** [libmpi.la] Error 1 > make[2]: Leaving directory `/home/makeuser/tmp/openmpi/ > openmpi-1.2.4_64/openmpi-1.2.4/ompi' > make[1]: *** [install-recursive] Error 1 > make[1]: Leaving directory `/home/makeuser/tmp/openmpi/ > openmpi-1.2.4_64/openmpi-1.2.4/ompi' > > I used the same configuration for OpenMPI 1.1.2 and it compiled and > installed successfully. > > Any suggestions? > > The following are the last several lines in the make log file: > > libtool: compile: /share/apps/intel/cce/9.1.047/bin/icc - > DHAVE_CONFIG_H -I. -I../opal/include -I../orte/include -I../ompi/ > include -I.. -I/share/apps/intel/cce/9.1.047/include -O3 -DNDEBUG - > finline-functions -fno-strict-aliasing -restrict -pthread -MT > runtime/ompi_mpi_params.lo -MD -MP -MF runtime/.deps/ > ompi_mpi_params.Tpo -c runtime/ompi_mpi_params.c -o runtime/ > ompi_mpi_params.o >/dev/null 2>&1 > depbase=`echo runtime/ompi_mpi_preconnect.lo | sed 's|[^/]*$|.deps/ > &|;s|\.lo$||'`;\ > /bin/sh ../libtool --tag=CC --mode=compile /share/apps/intel/cce/ > 9.1.047/bin/icc -DHAVE_CONFIG_H -I. -I../opal/include -I../orte/ > include -I../ompi/include -I.. -I/share/apps/intel/cce/9.1.047/ > include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - > restrict -pthread -MT runtime/ompi_mpi_preconnect.lo -MD -MP -MF > $depbase.Tpo -c -o runtime/ompi_mpi_preconnect.lo runtime/ > ompi_mpi_preconnect.c &&\ > mv -f $depbase.Tpo $depbase.Plo > libtool: compile: /share/apps/intel/cce/9.1.047/bin/icc - > DHAVE_CONFIG_H -I. -I../opal/include -I../orte/include -I../ompi/ > include -I.. -I/share/apps/intel/cce/9.1.047/include -O3 -DNDEBUG - > finline-functions -fno-strict-aliasing -restrict -pthread -MT > runtime/ompi_mpi_preconnect.lo -MD -MP -MF runtime/.deps/ > ompi_mpi_preconnect.Tpo -c runtime/ompi_mpi_preconnect.c -fPIC - > DPIC -o runtime/.libs/ompi_mpi_preconnect.o > libtool: compile: /share/apps/intel/cce/9.1.047/bin/icc - > DHAVE_CONFIG_H -I. -I../opal/include -I../orte/include -I../ompi/ > include -I.. -I/share/apps/intel/cce/9.1.047/include -O3 -DNDEBUG - > finline-functions -fno-strict-aliasing -restrict -pthread -MT > runtime/ompi_mpi_preconnect.lo -MD -MP -MF runtime/.deps/ > ompi_mpi_preconnect.Tpo -c runtime/ompi_mpi_preconnect.c -o runtime/ > ompi_mpi_preconnect.o >/dev/null 2>&1 > depbase=`echo win/win.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ > /bin/sh ../libtool --tag=CC --mode=compile /share/apps/intel/cce/ > 9.1.047/bin/icc -DHAVE_CONFIG_H -I. -I../opal/include -I../orte/ > include -I../ompi/include -I.. -I/share/apps/intel/cce/9.1.047/ > include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - > restrict -pthread -MT win/win.lo -MD -MP -MF $depbase.Tpo -c -o win/ > win.lo win/win.c &&\ > mv -f $depbase.Tpo $depbase.Plo > libtool: compile: /share/apps/intel/cce/9.1.047/bin/icc - > DHAVE_CONFIG_H -I. -I../opal/include -I../orte/include -I../ompi/ > include -I.. -I/share/apps/intel/cce/9.1.047/include -O3 -DNDEBUG - > finline-functions -fno-strict-aliasing -restrict -pthread -MT win/ > win.lo -MD -MP -MF win/.deps/win.Tpo -c win/win.c -fPIC -DPIC -o > win/.libs/win.o > libtool: compi
[OMPI users] Any way to make "btl_tcp_if_exclude" option system wide?
Howdy, My users are having to use this option with mpirun, otherwise the jobs will normally fail with a 111 communication error: --mca btl_tcp_if_exclude lo,eth1 Is there a way for me to set that MCA option system wide, perhaps via an environment variable so that they don't have to remember to use it? Thanks, Mike Mike Hanby mha...@uab.edu Information Systems Specialist II IT HPCS / Research Computing
Re: [OMPI users] Any way to make "btl_tcp_if_exclude" option system wide?
Thanks for the link to Sun HPC ClusterTools manual. I'll read through that. I'll have to consider which approach is best. Our users are 'supposed' to load the environment module for OpenMPI to properly configure their environment. The module file would be an easy location to add the variable. That isn't always the case, however, as some users like to do it old school and specify all of the variables in their job script. :-) We install OpenMPI using a custom built RPM, so I may need to add the option to the openmpi-mca-params.conf file when building the RPM. Decisions... -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Thursday, October 22, 2009 10:12 AM To: Open MPI Users Subject: Re: [OMPI users] Any way to make "btl_tcp_if_exclude" option system wide? Mike Hanby wrote: >Howdy, > >My users are having to use this option with mpirun, otherwise the jobs will >normally fail with a 111 communication error: > >--mca btl_tcp_if_exclude lo,eth1 > >Is there a way for me to set that MCA option system wide, perhaps via an >environment variable so that they don't have to remember to use it? > > Yes. Maybe you want to use a system-wide configuration file. I don't know where this is "best" documented, but it is at least discussed in the Sun HPC ClusterTools User Guide. (ClusterTools is an Open MPI distribution.) E.g., http://dlc.sun.com/pdf/821-0225-10/821-0225-10.pdf . Look at Chapter 7. The section "Using MCA Parameters as Environment Variables" starts on page 69, but I'm not sure environment variables are really the way to go. I think you want section "To Specify MCA Parameters Using a Text File", on page 71. The file would look like this: % cat $OPAL_PREFIX/lib/openmpi-mca-params.conf btl_tcp_if_exclude = lo,eth1 where $OPAL_PREFIX is where users will be getting OMPI. I'm not 100% sure on the name of that file, but need to run right now. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Infiniband Question
Howdy, When running a Gromacs job using OpenMPI 1.4.1 on Infiniband enabled nodes, I'm seeing the following process listing: \_ -bash /opt/gridengine/default/spool/compute-0-3/job_scripts/97037 \_ mpirun -np 4 mdrun_mpi -v -np 4 -s production-Npt-323K_4CPU -o production-Npt-323K_4CPU -c production-Npt-323K_4CPU -x production-Npt-323K_4CPU -g production-Npt-323K_4CPU.log \_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V compute-0-4.local orted -mca ess env -mca orte_ess_jobid 945881088 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 4 --hnp-uri "945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440" \_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V compute-0-2.local orted -mca ess env -mca orte_ess_jobid 945881088 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 4 --hnp-uri "945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440" \_ /opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V compute-0-1.local orted -mca ess env -mca orte_ess_jobid 945881088 -mca orte_ess_vpid 3 -mca orte_ess_num_procs 4 --hnp-uri "945881088.0;tcp://192.168.20.252:39440;tcp://192.168.21.252:39440" \_ mdrun_mpi -v -np 4 -s production-Npt-323K_4CPU -o production-Npt-323K_4CPU -c production-Npt-323K_4CPU -x production-Npt-323K_4CPU -g production-Npt-323K_4CPU.log Is it normal for these tcp addresses to be listed if the job is using Infiniband? The 192.168.20.x subnet is the eth0 GigE network And the 192.168.21.x subnet is the ib0 IPoverIB network Or is this job actually using TCPIP over Infiniband / GigE? I'm running mpirun without any special fabric includes / excludes. ompi_info lists openib as a valid fabric: $ ompi_info |grep openib MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.1) Thanks for any insight, Mike = Mike Hanby mha...@uab.edu Information Systems Specialist II IT HPCS / Research Computing
[OMPI users] The --with-sge option
Howdy, I'm compiling 1.2.8 on a system with SGE 6.1u4 and came across the "--with-sge" option on a Grid Engine posting. A couple questions: 1. I don't see --with-sge mentioned in the "./configure --help" output, nor can I find much reference to it on the open-mpi site, is this option really implemented? What does it do? 2. After compiling openmpi providing the --with-sge switch I ran the ompi_info binary and grep'd for sge in the output, there isn't any reference, should there be if the option was successfully passed to configure? Thanks, Mike
Re: [OMPI users] The --with-sge option
I did find the following in ompi_info: MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.7) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.7) However I see that in an ompi_info built without using the --with-sge switch. Also, since I'm building 1.2.8, shouldn't those versions after Component reflect 1.2.8? I set the PATH and LD_LIBRARY_PATH to point to the temp location of my new build and it still reports 1.2.7. Mike From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Mike Hanby Sent: Thursday, October 16, 2008 11:07 AM To: us...@open-mpi.org Subject: [OMPI users] The --with-sge option Howdy, I'm compiling 1.2.8 on a system with SGE 6.1u4 and came across the "--with-sge" option on a Grid Engine posting. A couple questions: 1. I don't see --with-sge mentioned in the "./configure --help" output, nor can I find much reference to it on the open-mpi site, is this option really implemented? What does it do? 2. After compiling openmpi providing the --with-sge switch I ran the ompi_info binary and grep'd for sge in the output, there isn't any reference, should there be if the option was successfully passed to configure? Thanks, Mike
Re: [OMPI users] OpenMPI portability problems: debug info isn'thelpful
Some further clarification, I read a post over on the SGE mailing list that said the --with-sge is part of ompi 1.3, not 1.2.x. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Aleksej Saushev Sent: Thursday, October 16, 2008 12:39 PM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI portability problems: debug info isn'thelpful Jeff Squyres writes: > On Oct 11, 2008, at 10:20 AM, Aleksej Saushev wrote: > >> $ ompi_info | grep oob >> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) >> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.7) > > Good! > >>> $ mpirun --mca rml_base_debug 100 -np 2 skosfile >> [asau.local:09060] mca: base: components_open: Looking for rml >> components >> [asau.local:09060] mca: base: components_open: distilling rml >> components >> [asau.local:09060] mca: base: components_open: accepting all >> rml components >> [asau.local:09060] mca: base: components_open: opening rml components >> [asau.local:09060] mca: base: components_open: found loaded >> component oob >> [asau.local:09060] mca: base: components_open: component oob >> open function successful >> [asau.local:09060] orte_rml_base_select: initializing rml >> component oob >> [asau.local:09060] orte_rml_base_select: init returned failure > > Ah ha -- this is progress. For some reason, your "oob" RML > plugin is declining to run. I see that its > query/initialization function is actually quite short: > > if(mca_oob_base_init() != ORTE_SUCCESS) > return NULL; > *priority = 1; > return &orte_rml_oob_module; > > So it must be failing the mca_oob_base_init() function -- this > is what initializes the underling "OOB" (out of band) > communications subsystem. > > Of course, this doesn't fail often, so we don't have any > run-time switches to enable the debugging output. :-( Edit > orte/mca/oob/base/ oob_base_open.c line 43 and change the value > of mca_oob_base_output from -1 to 0. Let's see that output -- > I'm particularly interested in the output from querying the tcp > oob component. I suspect that it's declining to run as well. > > I wonder if this is going to end up being an opal_if() issue -- > where we are traversing all the IP network interfaces from the > kernel... I'll bet even money that it is. [asau.local:04648] opal_ifinit: ioctl(SIOCGIFFLAGS) failed with errno=6 [asau.local:04648] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_rml_base_select failed --> Returned value -13 instead of ORTE_SUCCESS -- [asau.local:04648] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42 [asau.local:04648] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52 -- Open RTE was unable to initialize properly. The error occured while attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS. -- Why don't you use strerror(3) to print errno value explanation? >From : #define ENXIO 6 /* Device not configured */ It seems that I have to debug network interface probing, how should I use *_output subroutines so that they do print? I tried these changes but in vain: --- opal/util/if.c.orig 2008-08-25 23:16:50.0 +0400 +++ opal/util/if.c 2008-10-15 23:55:07.0 +0400 @@ -242,6 +242,8 @@ if(ifr->ifr_addr.sa_family != AF_INET) continue; + opal_output(0, "opal_ifinit: checking netif %s", ifr->ifr_name); + /* HERE IT FAILS!! */ if(ioctl(sd, SIOCGIFFLAGS, ifr) < 0) { opal_output(0, "opal_ifinit: ioctl(SIOCGIFFLAGS) failed with errno=%d", errno); continue; --- opal/util/if.c.orig 2008-08-25 23:16:50.0 +0400 +++ opal/util/if.c 2008-10-15 23:55:07.0 +0400 @@ -242,6 +242,8 @@ if(ifr->ifr_addr.sa_family != AF_INET) continue; + fprintf(stderr, "opal_ifinit: checking netif %s\n", ifr->ifr_name); + /* HERE IT FAILS!! */ if(ioctl(sd, SIOCGIFFLAGS, ifr) < 0) { opal_output(0, "opal_ifinit: ioctl(SIOCGIFFLAGS) failed with errno=%d", errno); continue; --- opal/util/output.c.orig 2008-08-25 23:16:50.0 +0400 +++ opal/util/output.c 2008-10-16 19:58:49.