Re: [OMPI users] [Wrf-users] WRF Problem running in Parallel on multiple nodes(cluster)
You still have to set the PATH and LD_LIBRARY_PATH on your remote nodes to include where you installed OMPI. Alternatively, use the absolute path name to mpirun in your cmd - we'll pick up the path and propagate it. On May 3, 2011, at 9:14 PM, Ahsan Ali wrote: > Dear Bart, > > I think OpenMPI don't need to be installed on all machines because they are > NFS shared with the master node. I don't know how to check output of which > orted, it is running just on the master node. I have another application > which is running similarly but I am having problem with WRF. > > On Tue, May 3, 2011 at 9:06 PM, Bart Brashers > wrote: > It looks like OpenMPI is not installed on all your execution machines. You > need to install at least the libs on all machines, or on an NFS-shared > location. Check the output of "which orted" on the machine that works. > > > Bart > > > From: wrf-users-boun...@ucar.edu [mailto:wrf-users-boun...@ucar.edu] On > Behalf Of Ahsan Ali > Sent: Tuesday, May 03, 2011 1:04 AM > To: us...@open-mpi.org > Subject: [Wrf-users] WRF Problem running in Parallel on multiple > nodes(cluster) > > > Hello, > > > I am able to run WRFV3.2.1 using mpirun on multiple cores of single machine, > but when I want to run it across multiple nodes in cluster using hostlist > then I get error, The compute nodes are mounted with the master node during > boot using NFS. I get following error. Please help. > > > [root@pmd02 em_real]# mpirun -np 10 -hostfile /home/pmdtest/hostlist > ./real.exe > > bash: orted: command not found > > bash: orted: command not found > > -- > > A daemon (pid 22006) died unexpectedly with status 127 while attempting > > to launch so we are aborting. > > > There may be more information reported by the environment (see above). > > > This may be because the daemon was unable to find all the needed shared > > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > > location of the shared libraries on the remote nodes and this will > > automatically be forwarded to the remote nodes. > > -- > > -- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > -- > > mpirun: clean termination accomplished > > > > -- > Syed Ahsan Ali Bokhari > Electronic Engineer (EE) > > > Research & Development Division > Pakistan Meteorological Department H-8/4, Islamabad. > Phone # off +92518358714 > > Cell # +923155145014 > > > > This message contains information that may be confidential, privileged or > otherwise protected by law from disclosure. It is intended for the exclusive > use of the Addressee(s). Unless you are the addressee or authorized agent of > the addressee, you may not review, copy, distribute or disclose to anyone the > message or any information contained within. If you have received this > message in error, please contact the sender by electronic reply to > em...@environcorp.com and immediately delete all copies of the message. > > > > > -- > Syed Ahsan Ali Bokhari > Electronic Engineer (EE) > > Research & Development Division > Pakistan Meteorological Department H-8/4, Islamabad. > Phone # off +92518358714 > Cell # +923155145014 > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Error occurred in MPI_Allreduce on communicator MPI_COMM_WORLD
Greetings !!! I am observing following error messages when executing attached test program... C:\test>mpirun mar_f.exe 0 0 0 size= 1 , rank= 0 start -- a= 2.002.002.00 2.002.00 b= 3.003.003.00 3.003.003.00 3.003.003.00 3.003.003.00 3.003.003.00 3.003.003.00 3.003.003.00 3.003.003.00 3.00 c= 0.000E+000 0.000E+000 0.000E+000 0.000E+000 0.000E+000 sum= 0.000E+000 0.000E+000 0.000E+000 0.000E+000 0.000E+000 sum= 30.030.030.0 30.030.0 [vbgyor:9920] *** An error occurred in MPI_Allreduce [vbgyor:9920] *** on communicator MPI_COMM_WORLD [vbgyor:9920] *** MPI_ERR_OP: invalid reduce operation [vbgyor:9920] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [vbgyor:09736] [[27471,0],0]-[[27471,1],0] mca_oob_tcp_msg_recv: readv failed: Unknown error (10054) -- mpirun has exited due to process rank 0 with PID 488 on node vbgyor exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- Environment: OS: Windows 7 64-bit Compilers: Visual Studio 2008 32bit and Intel ifort 32bit OpenMPI: OpenMPI-1.5.2 pre-built libraries and also with locally built libraries Thank you. -Hiral mar_f.f Description: Binary data
Re: [OMPI users] Error occurred in MPI_Allreduce on communicator MPI_COMM_WORLD
On Wednesday, May 04, 2011 04:04:37 PM hi wrote: > Greetings !!! > > I am observing following error messages when executing attached test > program... > > > C:\test>mpirun mar_f.exe ... > [vbgyor:9920] *** An error occurred in MPI_Allreduce > [vbgyor:9920] *** on communicator MPI_COMM_WORLD > [vbgyor:9920] *** MPI_ERR_OP: invalid reduce operation > [vbgyor:9920] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) I'm not a fortran programmer but it seems to me that placing the MPI_Allreduce call in a subroutine like that broke the meaning of MPI_SUM and MPI_REAL in that scope. Adding: include 'mpif.h' after SUBROUTINE PAR_BLAS2(m, n, a, b, c, comm) helps. /Peter signature.asc Description: This is a digitally signed message part.
[OMPI users] configure: mpi-threads disabled by default
I've been asked about mixed-mode MPI/OpenMP programming with OpenMPI, so have been digging through the past list messages on MPI_THREAD_*, etc. Interesting stuff :) Before I go ahead and add "--enable-mpi-threads" to our standard configure flags, is there any reason it's disabled by default, please? I'm a bit puzzled, as this default seems in conflict with whole "Law of Least Astonishment" thing. Have I missed some disaster that's going to happen? Thanks, Mark -- - Mark Dixon Email: m.c.di...@leeds.ac.uk HPC/Grid Systems Support Tel (int): 35429 Information Systems Services Tel (ext): +44(0)113 343 5429 University of Leeds, LS2 9JT, UK -
Re: [OMPI users] configure: mpi-threads disabled by default
Depending on what version you use, the option has been renamed --enable-mpi-thread-multiple. Anyhow, there is widespread concern whether the support is robust. The support is known to be limited and the performance poor. On 5/4/2011 9:14 AM, Mark Dixon wrote: I've been asked about mixed-mode MPI/OpenMP programming with OpenMPI, so have been digging through the past list messages on MPI_THREAD_*, etc. Interesting stuff :) Before I go ahead and add "--enable-mpi-threads" to our standard configure flags, is there any reason it's disabled by default, please? I'm a bit puzzled, as this default seems in conflict with whole "Law of Least Astonishment" thing. Have I missed some disaster that's going to happen?
[OMPI users] cputype (7) does not match previous archive members cputype
Hello: I am trying to install OpenMPI1.4.3 on a Mac OS X 10.6.7. I was able to install OpenMPI using the command ./configure --prefix=/opt/openmpi1.4.3GF CC=/sw/bin/gcc-fsf-4.5 CXX=/sw/bin/g++-fsf-4.5 F77=gfortran F90=gfortran I then tried to install OpenMPI compiled for -m64 using the command ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 CFLAGS=-m64 CXX=/sw/bin/g++-fsf-4.5 CXXFLAGS=-m64 F77=gfortran FFLAGS=-m64 FC=gfortran FCFLAGS=-m64 This step worked fine but when I did make all install I got the following message: Making all in config make[1]: Nothing to be done for `all'. Making all in contrib make[1]: Nothing to be done for `all'. Making all in opal Making all in include make all-am Making all in libltdl make all-am Making all in asm depbase=`echo asm.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../libtool --tag=CC --mode=compile /sw/bin/gcc-fsf-4.5 -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../.. -D_REENTRANT -O3 -DNDEBUG -m64 -finline-functions -fno-strict-aliasing -fvisibility=hidden -MT asm.lo -MD -MP -MF $depbase.Tpo -c -o asm.lo asm.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: /sw/bin/gcc-fsf-4.5 -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../.. -D_REENTRANT -O3 -DNDEBUG -m64 -finline-functions -fno-strict-aliasing -fvisibility=hidden -MT asm.lo -MD -MP -MF .deps/asm.Tpo -c asm.c -fno-common -DPIC -o .libs/asm.o rm -f atomic-asm.S ln -s "../../opal/asm/generated/atomic-local.s" atomic-asm.S depbase=`echo atomic-asm.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../libtool --mode=compile /sw/bin/gcc-fsf-4.5 -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../.. -D_REENTRANT -O3 -DNDEBUG -m64 -finline-functions -fno-strict-aliasing -MT atomic-asm.lo -MD -MP -MF $depbase.Tpo -c -o atomic-asm.lo atomic-asm.S &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: /sw/bin/gcc-fsf-4.5 -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../.. -D_REENTRANT -O3 -DNDEBUG -m64 -finline-functions -fno-strict-aliasing -MT atomic-asm.lo -MD -MP -MF .deps/atomic-asm.Tpo -c atomic-asm.S -fno-common -DPIC -o .libs/atomic-asm.o /bin/sh ../../libtool --tag=CC --mode=link /sw/bin/gcc-fsf-4.5 -O3 -DNDEBUG -m64 -finline-functions -fno-strict-aliasing -fvisibility=hidden -export-dynamic -o libasm.la asm.lo atomic-asm.lo -lutil libtool: link: rm -fr .libs/libasm.a .libs/libasm.la libtool: link: ar cru .libs/libasm.a .libs/asm.o .libs/atomic-asm.o /usr/bin/ranlib: file: .libs/libasm.a(asm.o) has no symbols libtool: link: ranlib .libs/libasm.a ranlib: file: .libs/libasm.a(asm.o) has no symbols libtool: link: ( cd ".libs" && rm -f "libasm.la" && ln -s "../libasm.la" "libasm.la" ) Making all in etc make[2]: Nothing to be done for `all'. Making all in event Making all in compat Making all in sys make[4]: Nothing to be done for `all'. make[4]: Nothing to be done for `all-am'. depbase=`echo event.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../libtool --tag=CC --mode=compile /sw/bin/gcc-fsf-4.5 -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../opal/event/compat -I../.. -D_REENTRANT -O3 -DNDEBUG -m64 -finline-functions -fno-strict-aliasing -fvisibility=hidden -MT event.lo -MD -MP -MF $depbase.Tpo -c -o event.lo event.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: /sw/bin/gcc-fsf-4.5 -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../opal/event/compat -I../.. -D_REENTRANT -O3 -DNDEBUG -m64 -finline-functions -fno-strict-aliasing -fvisibility=hidden -MT event.lo -MD -MP -MF .deps/event.Tpo -c event.c -fno-common -DPIC -o .libs/event.o depbase=`echo log.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../libtool --tag=CC --mode=compile /sw/bin/gcc-fsf-4.5 -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../opal/event/compat -I../.. -D_REENTRANT -O3 -DNDEBUG -m64 -finline-functions -fno-strict-aliasing -fvisibility=hidden -MT log.lo -MD -MP -MF $depbase.Tpo -c -o log.lo log.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: /sw/bin/gcc-fsf-4.5 -DHAVE_CONFIG_H -I. -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../opal/event/compat -I../.. -D_REENTRANT -O3 -
Re: [OMPI users] cputype (7) does not match previous archive members cputype
On May 4, 2011, at 12:39 PM, Paul Cizmas wrote: > ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 > CFLAGS=-m64 CXX=/sw/bin/g++-fsf-4.5 CXXFLAGS=-m64 F77=gfortran FFLAGS=-m64 > FC=gfortran FCFLAGS=-m64 Oops -- sorry, you probably also need to include LDFLAGS=-m64, too (i.e., linker flags, vs. compiler flags). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] cputype (7) does not match previous archive members cputype
I added LDFLAGS=-m64, such that the command is now ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 CFLAGS=-m64 CXX=/sw/bin/g++-fsf-4.5 CXXFLAGS=-m64 F77=gfortran FFLAGS=-m64 FC=gfortran FCFLAGS=-m64 LDFLAGS=-m64 but it did not work. It still dies when I do make all install with the errors: +++ libtool: link: rm -fr .libs/libevent.a libtool: link: ar cru .libs/libevent.a .libs/event.o .libs/log.o .libs/evutil.o .libs/signal.o .libs/select.o /usr/bin/ranlib: archive member: .libs/libevent.a(signal.o) cputype (7) does not match previous archive members cputype (16777223) (all members must match) /usr/bin/ranlib: archive member: .libs/libevent.a(select.o) cputype (7) does not match previous archive members cputype (16777223) (all members must match) libtool: link: ranlib .libs/libevent.a ranlib: archive member: .libs/libevent.a(signal.o) cputype (7) does not match previous archive members cputype (16777223) (all members must match) ranlib: archive member: .libs/libevent.a(select.o) cputype (7) does not match previous archive members cputype (16777223) (all members must match) make[3]: *** [libevent.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 +++ Would it be possible that libevent.a is the problem? Does libevent.a use LDFLAGS? Thank you, Paul From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Jeff Squyres [jsquy...@cisco.com] Sent: Wednesday, May 04, 2011 11:43 AM To: Open MPI Users Subject: Re: [OMPI users] cputype (7) does not match previous archive members cputype On May 4, 2011, at 12:39 PM, Paul Cizmas wrote: > ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 > CFLAGS=-m64 CXX=/sw/bin/g++-fsf-4.5 CXXFLAGS=-m64 F77=gfortran FFLAGS=-m64 > FC=gfortran FCFLAGS=-m64 Oops -- sorry, you probably also need to include LDFLAGS=-m64, too (i.e., linker flags, vs. compiler flags). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] cputype (7) does not match previous archive members cputype
Did you make clean first? configure won't clean out the prior installation, so you may be picking up stale libs. On May 4, 2011, at 11:27 AM, Cizmas, Paul wrote: > I added LDFLAGS=-m64, such that the command is now > > ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 > CFLAGS=-m64 CXX=/sw/bin/g++-fsf-4.5 CXXFLAGS=-m64 F77=gfortran FFLAGS=-m64 > FC=gfortran FCFLAGS=-m64 LDFLAGS=-m64 > > but it did not work. > > It still dies when I do > > make all install > > with the errors: > > +++ > libtool: link: rm -fr .libs/libevent.a > libtool: link: ar cru .libs/libevent.a .libs/event.o .libs/log.o > .libs/evutil.o .libs/signal.o .libs/select.o > /usr/bin/ranlib: archive member: .libs/libevent.a(signal.o) cputype (7) does > not match previous archive members cputype (16777223) (all members must match) > /usr/bin/ranlib: archive member: .libs/libevent.a(select.o) cputype (7) does > not match previous archive members cputype (16777223) (all members must match) > libtool: link: ranlib .libs/libevent.a > ranlib: archive member: .libs/libevent.a(signal.o) cputype (7) does not match > previous archive members cputype (16777223) (all members must match) > ranlib: archive member: .libs/libevent.a(select.o) cputype (7) does not match > previous archive members cputype (16777223) (all members must match) > make[3]: *** [libevent.la] Error 1 > make[2]: *** [all-recursive] Error 1 > make[1]: *** [all-recursive] Error 1 > make: *** [all-recursive] Error 1 > +++ > > Would it be possible that libevent.a is the problem? Does libevent.a use > LDFLAGS? > > Thank you, > > Paul > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of > Jeff Squyres [jsquy...@cisco.com] > Sent: Wednesday, May 04, 2011 11:43 AM > To: Open MPI Users > Subject: Re: [OMPI users] cputype (7) does not match previous archive > members cputype > > On May 4, 2011, at 12:39 PM, Paul Cizmas wrote: > >> ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 >> CFLAGS=-m64 CXX=/sw/bin/g++-fsf-4.5 CXXFLAGS=-m64 F77=gfortran FFLAGS=-m64 >> FC=gfortran FCFLAGS=-m64 > > Oops -- sorry, you probably also need to include LDFLAGS=-m64, too (i.e., > linker flags, vs. compiler flags). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] cputype (7) does not match previous archive members cputype
Life is much better after "make clean" :) Thank you, Paul From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Ralph Castain [r...@open-mpi.org] Sent: Wednesday, May 04, 2011 12:29 PM To: Open MPI Users Subject: Re: [OMPI users] cputype (7) does not match previous archive members cputype Did you make clean first? configure won't clean out the prior installation, so you may be picking up stale libs. On May 4, 2011, at 11:27 AM, Cizmas, Paul wrote: > I added LDFLAGS=-m64, such that the command is now > > ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 > CFLAGS=-m64 CXX=/sw/bin/g++-fsf-4.5 CXXFLAGS=-m64 F77=gfortran FFLAGS=-m64 > FC=gfortran FCFLAGS=-m64 LDFLAGS=-m64 > > but it did not work. > > It still dies when I do > > make all install > > with the errors: > > +++ > libtool: link: rm -fr .libs/libevent.a > libtool: link: ar cru .libs/libevent.a .libs/event.o .libs/log.o > .libs/evutil.o .libs/signal.o .libs/select.o > /usr/bin/ranlib: archive member: .libs/libevent.a(signal.o) cputype (7) does > not match previous archive members cputype (16777223) (all members must match) > /usr/bin/ranlib: archive member: .libs/libevent.a(select.o) cputype (7) does > not match previous archive members cputype (16777223) (all members must match) > libtool: link: ranlib .libs/libevent.a > ranlib: archive member: .libs/libevent.a(signal.o) cputype (7) does not match > previous archive members cputype (16777223) (all members must match) > ranlib: archive member: .libs/libevent.a(select.o) cputype (7) does not match > previous archive members cputype (16777223) (all members must match) > make[3]: *** [libevent.la] Error 1 > make[2]: *** [all-recursive] Error 1 > make[1]: *** [all-recursive] Error 1 > make: *** [all-recursive] Error 1 > +++ > > Would it be possible that libevent.a is the problem? Does libevent.a use > LDFLAGS? > > Thank you, > > Paul > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of > Jeff Squyres [jsquy...@cisco.com] > Sent: Wednesday, May 04, 2011 11:43 AM > To: Open MPI Users > Subject: Re: [OMPI users] cputype (7) does not match previous archive > members cputype > > On May 4, 2011, at 12:39 PM, Paul Cizmas wrote: > >> ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 >> CFLAGS=-m64 CXX=/sw/bin/g++-fsf-4.5 CXXFLAGS=-m64 F77=gfortran FFLAGS=-m64 >> FC=gfortran FCFLAGS=-m64 > > Oops -- sorry, you probably also need to include LDFLAGS=-m64, too (i.e., > linker flags, vs. compiler flags). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users