[OMPI users] Program does not finish after MPI_Finalize()
Dear All, (follows a previous mail) I don't understand the strange behavior of this small code: sometimes it ends, sometimes not. The output of MPI_Finalized is 1 (for each processes if n>1), but the code doesn't end. I am forced to use Ctrl-C. I compiled it with the command line: "mpicc --std=c99" / gcc is 4.5, on a Quad-Core AMD Opteron(tm) Processor 8356 "mpiexec -n 1 a.out" or "mpiexec -n 2 a.out" to run the code. "ps aux" returns that the program is in Sl+ state. Sometimes, I can see also a line like this: p100156892 0.1 0.0 43376 1828 ?Ssl 14:50 0:00 orted --hnp --set-sid --report-uri 8 --singleton-died-pipe 9 Is this a bug? Do I do something wrong? If you have any tips... Thank you. - #include "stdio.h" #include "mpi.h" int main(int argc, char *argv[]) { int my_num, mpi_size ; int flag ; MPI_Init(&argc, &argv) ; MPI_Comm_rank(MPI_COMM_WORLD, &my_num); printf("%d calls MPI_Finalize()\n\n\n", my_num) ; MPI_Finalize() ; MPI_Finalized(&flag) ; printf("MPI finalized: %d\n", flag) ; return 0 ; } --- -- Yves Caniou Associate Professor at Université Lyon 1, Member of the team project INRIA GRAAL in the LIP ENS-Lyon, Délégation CNRS in Japan French Laboratory of Informatics (JFLI), * in Information Technology Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan tel: +81-3-5841-0540 * in National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan tel: +81-3-4212-2412 http://graal.ens-lyon.fr/~ycaniou/
Re: [OMPI users] Program does not finish after MPI_Finalize()
It looks to me like you are getting version confusion - your path and ld_library_path aren't pointing to the place where you installed 1.4.1 and you are either getting someone else's mpiexec or getting 1.2.x instead. Could also be that mpicc isn't the one from 1.4.1 either. Check to ensure that the mpiexec and mpicc you are using are from 1.4.1, and that your environment is pointing to the right place. On May 24, 2010, at 12:15 AM, Yves Caniou wrote: > Dear All, > (follows a previous mail) > > I don't understand the strange behavior of this small code: sometimes it > ends, sometimes not. > The output of MPI_Finalized is 1 (for each processes if n>1), but the code > doesn't end. I am forced to use Ctrl-C. > > I compiled it with the command line: > "mpicc --std=c99" / gcc is 4.5, on a Quad-Core AMD Opteron(tm) Processor > 8356 > "mpiexec -n 1 a.out" or "mpiexec -n 2 a.out" to run the code. > "ps aux" returns that the program is in Sl+ state. > > Sometimes, I can see also a line like this: > p100156892 0.1 0.0 43376 1828 ?Ssl 14:50 0:00 orted --hnp > --set-sid --report-uri 8 --singleton-died-pipe 9 > > Is this a bug? Do I do something wrong? > If you have any tips... > Thank you. > > - > #include "stdio.h" > #include "mpi.h" > > int > main(int argc, char *argv[]) > { > int my_num, mpi_size ; > int flag ; > > MPI_Init(&argc, &argv) ; > > MPI_Comm_rank(MPI_COMM_WORLD, &my_num); > printf("%d calls MPI_Finalize()\n\n\n", my_num) ; > > MPI_Finalize() ; > > MPI_Finalized(&flag) ; > printf("MPI finalized: %d\n", flag) ; > return 0 ; > } > --- > > -- > Yves Caniou > Associate Professor at Université Lyon 1, > Member of the team project INRIA GRAAL in the LIP ENS-Lyon, > Délégation CNRS in Japan French Laboratory of Informatics (JFLI), > * in Information Technology Center, The University of Tokyo, >2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan >tel: +81-3-5841-0540 > * in National Institute of Informatics >2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan >tel: +81-3-4212-2412 > http://graal.ens-lyon.fr/~ycaniou/ > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Program does not finish after MPI_Finalize()
I rechecked, but didn't see anything wrong. Here is how I set my environment. Tkx. $>mpicc --v Using built-in specs. COLLECT_GCC=//home/p10015/gcc/bin/x86_64-unknown-linux-gnu-gcc-4.5.0 COLLECT_LTO_WRAPPER=/hsfs/home4/p10015/gcc/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.5.0/configure --prefix=/home/p10015/gcc --with-gmp=/home/p10015/gmp --with-mpfr=/home/p10015/mpfr --with-mpc=/home/p10015/mpc --enable-lto --with-ppl=/home/p10015/ppl --with-libelf=/home/p10015/libelf --with-cloog=/home/p10015/cloog-ppl --enable-languages=c,c++,lto --disable-libada --enable-stage1-languages=c,c++,lto Thread model: posix gcc version 4.5.0 (GCC) $>mpiexec mpiexec (OpenRTE) 1.4.2 [cut] $>echo $LD_LIBRARY_PATH /home/p10015/gcc/lib64/:/home/p10015/openmpi/lib/:/home/p10015/omniORB/lib/:/home/p10015/omniORB/lib64/:/home/p10015/lib/:/home/p10015/lib64/::/usr/lib/:/usr/lib/xen/:/lib/: $>echo $PATH .:/home/p10015/gcc/bin/:/home/p10015/openmpi/bin/:/home/p10015/omniORB/bin/:/home/p10015/git/bin/:/home/p10015/Bin/:/home/p10015/bin/:..:/usr/local/bin/:/opt/ofort90/bin:/opt/optc/bin:/opt/optscxx/bin:/opt/hitachi/nqs/bin:/opt/torque/bin:/opt/mpich-mx/bin:/usr/java/default/bin:/bin:/usr/bin:/sbin/:/usr/sbin/ $>echo $CPLUS_INCLUDE_PATH /home/p10015/gcc/include/c++/4.5.0/:/home/p10015/openmpi/include/:/home/p10015/omniORB/include/: $>echo $C_INCLUDE_PATH /home/p10015/gcc/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/include-fixed/:/home/p10015/gcc/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/include/:/home/p10015/openmpi/include/:/home/p10015/omniORB/include/: Le Monday 24 May 2010 08:35:17 Ralph Castain, vous avez écrit : > It looks to me like you are getting version confusion - your path and > ld_library_path aren't pointing to the place where you installed 1.4.1 and > you are either getting someone else's mpiexec or getting 1.2.x instead. > Could also be that mpicc isn't the one from 1.4.1 either. > > Check to ensure that the mpiexec and mpicc you are using are from 1.4.1, > and that your environment is pointing to the right place. > > On May 24, 2010, at 12:15 AM, Yves Caniou wrote: > > Dear All, > > (follows a previous mail) > > > > I don't understand the strange behavior of this small code: sometimes it > > ends, sometimes not. The output of MPI_Finalized is 1 (for each processes > > if n>1), but the code doesn't end. I am forced to use Ctrl-C. > > > > I compiled it with the command line: > > "mpicc --std=c99" / gcc is 4.5, on a Quad-Core AMD Opteron(tm) > > Processor 8356 "mpiexec -n 1 a.out" or "mpiexec -n 2 a.out" to run the > > code. > > "ps aux" returns that the program is in Sl+ state. > > > > Sometimes, I can see also a line like this: > > p100156892 0.1 0.0 43376 1828 ?Ssl 14:50 0:00 orted > > --hnp --set-sid --report-uri 8 --singleton-died-pipe 9 > > > > Is this a bug? Do I do something wrong? > > If you have any tips... > > Thank you. > > > > - > > #include "stdio.h" > > #include "mpi.h" > > > > int > > main(int argc, char *argv[]) > > { > > int my_num, mpi_size ; > > int flag ; > > > > MPI_Init(&argc, &argv) ; > > > > MPI_Comm_rank(MPI_COMM_WORLD, &my_num); > > printf("%d calls MPI_Finalize()\n\n\n", my_num) ; > > > > MPI_Finalize() ; > > > > MPI_Finalized(&flag) ; > > printf("MPI finalized: %d\n", flag) ; > > return 0 ; > > } > > --- > > > > -- > > Yves Caniou > > Associate Professor at Université Lyon 1, > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon, > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI), > > * in Information Technology Center, The University of Tokyo, > >2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan > >tel: +81-3-5841-0540 > > * in National Institute of Informatics > >2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan > >tel: +81-3-4212-2412 > > http://graal.ens-lyon.fr/~ycaniou/ > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Yves Caniou Associate Professor at Université Lyon 1, Member of the team project INRIA GRAAL in the LIP ENS-Lyon, Délégation CNRS in Japan French Laboratory of Informatics (JFLI), * in Information Technology Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan tel: +81-3-5841-0540 * in National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan tel: +81-3-4212-2412 http://graal.ens-lyon.fr/~ycaniou/
Re: [OMPI users] OpenMPI Checkpoint/Restart is failed
Hi all, I had the same problem like Jitsumoto, i.e. OpenMPI 1.4.2 failed to restart and the patch which Fernando gave didn't work. I also tried 1.5 nightly snapshots but it seemed not working well. For some purpose, I don't want to use --enable-ft-thread in configure but the same error occurred even --enable-ft-thread is used. Here is my configure for OMPI 1.5a1r23135: >./configure \ >--with-ft=cr \ >--enable-mpi-threads \ >--with-blcr=/home/nguyen/opt/blcr --with-blcr-libdir=/home/nguyen/opt/blcr/lib \ >--prefix=/home/nguyen/opt/openmpi_1.5 --enable-mpirun-prefix-by-default \ and errors: >$ mpirun -am ft-enable-cr -machinefile ./host ./a.out >0 >0 >1 >1 >2 >2 >3 >3 >-- >mpirun has exited due to process rank 1 with PID 6582 on >node rc014 exiting improperly. There are two reasons this could occur: >1. this process did not call "init" before exiting, but others in >the job did. This can cause a job to hang indefinitely while it waits >for all processes to call "init". By rule, if one process calls "init", >then ALL processes must call "init" prior to termination. >2. this process called "init", but exited without calling "finalize". >By rule, all processes that call "init" MUST call "finalize" prior to >exiting or it will be considered an "abnormal termination" >This may have caused other processes in the application to be >terminated by signals sent by mpirun (as reported here). >--- And here is the checkpoint command: >$ ompi-checkpoint -s -v --term 10982 >[rc013.local:11001] [ 0.00 / 0.14] Requested - ... >[rc013.local:11001] [ 0.00 / 0.14] Pending - ... >[rc013.local:11001] [ 0.01 / 0.15] Running - ... >[rc013.local:11001] [ 7.79 / 7.94] Finished - >ompi_global_snapshot_10982.ckpt >Snapshot Ref.: 0 ompi_global_snapshot_10982.ckpt I also took a look inside the checkpoint files and found that the snapshot was taken: ~/tmp/ckpt/ompi_global_snapshot_10982.ckpt/0/opal_snapshot_1.ckpt/ompi_blcr_context.6582 But restarting failed as follows: >$ ompi-restart ompi_global_snapshot_10982.ckpt >-- >mpirun noticed that process rank 1 with PID 11346 on node rc013.local exited >on signal 11 (Segmentation fault). >-- Is there any idea about this? Thank you! Regards, Nguyen Toan On Mon, May 24, 2010 at 4:08 PM, Hideyuki Jitsumoto < jitum...@gsic.titech.ac.jp> wrote: > -- Forwarded message -- > From: Fernando Lemos > Date: Thu, Apr 15, 2010 at 2:18 AM > Subject: Re: [OMPI users] OpenMPI Checkpoint/Restart is failed > To: Open MPI Users > > > On Wed, Apr 14, 2010 at 5:25 AM, Hideyuki Jitsumoto > wrote: > > Fernando, > > > > Thank you for your reply. > > I tried to patch the file you mentioned, but the output did not change. > > I didn't test the patch, tbh. I'm using 1.5 nightly snapshots, and it > works great. > > >>Are you using a shared file system? You need to use a shared file > > system for checkpointing with 1.4.1: > > What is the shared file system ? do you mean NFS, Lustre and so on ? > > (I'm sorry about my ignorance...) > > Something like NFS, yea. > > > If I use only one node for application, do I need such a > shared-file-system ? > > No, for a single node, checkpointing with 1.4.1 should work (it works > for me, at least). If you're using a single node, then your problem is > probably not related to the bug report I posted. > > > Regards, > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > Sincerely Yours, > Hideyuki Jitsumoto (jitum...@gsic.titech.ac.jp) > Tokyo Institute of Technology > Global Scientific Information and Computing center (Matsuoka Lab.) >
Re: [OMPI users] Program does not finish after MPI_Finalize()
Just to make sure I understand -- you're running the hello world app you pasted in an earlier email with just 1 MPI process on the local machine, and you're seeing hangs. Is that right? (there was a reference in a prior email to 2 different architectures -- that's why I'm clarifying) On May 24, 2010, at 2:53 AM, Yves Caniou wrote: > I rechecked, but didn't see anything wrong. > Here is how I set my environment. Tkx. > > $>mpicc --v > Using built-in specs. > COLLECT_GCC=//home/p10015/gcc/bin/x86_64-unknown-linux-gnu-gcc-4.5.0 > COLLECT_LTO_WRAPPER=/hsfs/home4/p10015/gcc/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper > Target: x86_64-unknown-linux-gnu > Configured > with: ../gcc-4.5.0/configure --prefix=/home/p10015/gcc > --with-gmp=/home/p10015/gmp --with-mpfr=/home/p10015/mpfr > --with-mpc=/home/p10015/mpc --enable-lto --with-ppl=/home/p10015/ppl > --with-libelf=/home/p10015/libelf --with-cloog=/home/p10015/cloog-ppl > --enable-languages=c,c++,lto --disable-libada > --enable-stage1-languages=c,c++,lto > Thread model: posix > gcc version 4.5.0 (GCC) > > $>mpiexec > mpiexec (OpenRTE) 1.4.2 > [cut] > > $>echo $LD_LIBRARY_PATH > /home/p10015/gcc/lib64/:/home/p10015/openmpi/lib/:/home/p10015/omniORB/lib/:/home/p10015/omniORB/lib64/:/home/p10015/lib/:/home/p10015/lib64/::/usr/lib/:/usr/lib/xen/:/lib/: > > $>echo $PATH > .:/home/p10015/gcc/bin/:/home/p10015/openmpi/bin/:/home/p10015/omniORB/bin/:/home/p10015/git/bin/:/home/p10015/Bin/:/home/p10015/bin/:..:/usr/local/bin/:/opt/ofort90/bin:/opt/optc/bin:/opt/optscxx/bin:/opt/hitachi/nqs/bin:/opt/torque/bin:/opt/mpich-mx/bin:/usr/java/default/bin:/bin:/usr/bin:/sbin/:/usr/sbin/ > > $>echo $CPLUS_INCLUDE_PATH > /home/p10015/gcc/include/c++/4.5.0/:/home/p10015/openmpi/include/:/home/p10015/omniORB/include/: > > $>echo $C_INCLUDE_PATH > /home/p10015/gcc/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/include-fixed/:/home/p10015/gcc/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/include/:/home/p10015/openmpi/include/:/home/p10015/omniORB/include/: > > > Le Monday 24 May 2010 08:35:17 Ralph Castain, vous avez écrit : > > It looks to me like you are getting version confusion - your path and > > ld_library_path aren't pointing to the place where you installed 1.4.1 and > > you are either getting someone else's mpiexec or getting 1.2.x instead. > > Could also be that mpicc isn't the one from 1.4.1 either. > > > > Check to ensure that the mpiexec and mpicc you are using are from 1.4.1, > > and that your environment is pointing to the right place. > > > > On May 24, 2010, at 12:15 AM, Yves Caniou wrote: > > > Dear All, > > > (follows a previous mail) > > > > > > I don't understand the strange behavior of this small code: sometimes it > > > ends, sometimes not. The output of MPI_Finalized is 1 (for each processes > > > if n>1), but the code doesn't end. I am forced to use Ctrl-C. > > > > > > I compiled it with the command line: > > > "mpicc --std=c99" / gcc is 4.5, on a Quad-Core AMD Opteron(tm) > > > Processor 8356 "mpiexec -n 1 a.out" or "mpiexec -n 2 a.out" to run the > > > code. > > > "ps aux" returns that the program is in Sl+ state. > > > > > > Sometimes, I can see also a line like this: > > > p100156892 0.1 0.0 43376 1828 ?Ssl 14:50 0:00 orted > > > --hnp --set-sid --report-uri 8 --singleton-died-pipe 9 > > > > > > Is this a bug? Do I do something wrong? > > > If you have any tips... > > > Thank you. > > > > > > - > > > #include "stdio.h" > > > #include "mpi.h" > > > > > > int > > > main(int argc, char *argv[]) > > > { > > > int my_num, mpi_size ; > > > int flag ; > > > > > > MPI_Init(&argc, &argv) ; > > > > > > MPI_Comm_rank(MPI_COMM_WORLD, &my_num); > > > printf("%d calls MPI_Finalize()\n\n\n", my_num) ; > > > > > > MPI_Finalize() ; > > > > > > MPI_Finalized(&flag) ; > > > printf("MPI finalized: %d\n", flag) ; > > > return 0 ; > > > } > > > --- > > > > > > -- > > > Yves Caniou > > > Associate Professor at Université Lyon 1, > > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon, > > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI), > > > * in Information Technology Center, The University of Tokyo, > > >2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan > > >tel: +81-3-5841-0540 > > > * in National Institute of Informatics > > >2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan > > >tel: +81-3-4212-2412 > > > http://graal.ens-lyon.fr/~ycaniou/ > > > > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > Yves Caniou > Associate Professor at Université Lyon 1, > Member of the team project INRIA GRAAL in the LIP ENS-Lyon, > Délégation CNRS in Japan French Laboratory of Informatics (JFLI), > * in Information Technology Center, The University of Tokyo, > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan > tel: +81-3-5841-0540 > * in Nation
Re: [OMPI users] mpirun: symbol lookup error: mpirun: undefined symbol:orte_xml_fp
On May 23, 2010, at 11:57 AM, Dawid Laszuk wrote: > It's a bit awkward for me to ask, because I'm not only newbie in > parallel programming but also in Linux system, but i've but searching > for long enough to loose any hopes. No problem; we'll try to help. > My problem is, when I try to run compiled code with "mpirun" I get output: > mpirun: symbol lookup error: mpirun: undefined symbol: orte_xml_fp > > I can compile code with "mpicc" (i write in C) and it runs, but only > on one CPU ( I have Athlon X2 64bit, dual core ). There is no > difference when I write "mpirun", "mpiexec" or "orterun" (but that's > normal, isn't it?). It doesn't matter what I'm trying to run; I get > that output just by typing it into console. FWIW: all three of those are exactly equivalent in Open MPI (mpirun, mpiexec, orterun). So just to be clear -- if you mpirun a simple MPI test executable (e.g., the test applications in the Open MPI examples/ directory), you get that error message? E.g.: cd examples make mpirun -np 2 examples/hello_c ...you see the missing symbol error here... What happens if you just run hello_c without mpirun? ./hello_c What's the output from "ldd hello_c"? (this tells us which libraries it's linking to at run-time -- from your configure output, it should list /usr/local/lib/libmpi.so in there somewhere) > As I said, I know basics with linux and even adding some libs into sys > path is something which I still don't know ( and don't how much time > to read about). > > I've attached things which maybe helpful ( output from "./configure", > "make all", "make install" ). > I'm using Linux Mint 8 ( Kernel Linux 2.6.31-20-generic). Have two CPU > cores AMD Athlon Dual-Core QL-60. The most common case for this kind of error is mixing-n-matching multiple versions of Open MPI on the same machine. Does your Linux distro come with Open MPI installed already, for example? You might want to configure Open MPI with --enable-mpirun-prefix-by-default -- this does some magic, particularly when running across multiple machines, to try to ensure that the "right" Open MPI installation is picked. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Some Questions on Building OMPI on Linux Em64t
On May 19, 2010, at 2:19 PM, Michael E. Thomadakis wrote: > I would like to build OMPI V1.4.2 and make it available to our users at the > Supercomputing Center at TAMU. Our system is a 2-socket, 4-core Nehalem > @2.8GHz, 24GiB DRAM / node, 324 nodes connected to 4xQDR Voltaire fabric, > CentOS/RHEL 5.4. Sorry for the delay in replying... > 1) high-resolution timers: how do I specify the HRT linux timers in the > --with-timer=TYPE > line of ./configure ? You shouldn't need to do anything; the "linux" timer component of Open MPI should get automatically selected. You should be able to see this in the stdout of Open MPI's "configure", and/or if you run ompi_info | grep timer -- there should only be one entry: linux. > 2) I have installed blcr V0.8.2 but when I try to built OMPI and I point to > the > full installation it complains it cannot find it. Note that I build BLCR with > GCC but I am building OMPI with Intel compilers (V11.1) Can you be more specific here? > 3) Does OMPI by default use SHM for intra-node message IPC but revert to IB > for > inter-node ? Yes. You can force this, but it's usually unnecessary: mpirun --mca btl sm,self,openib sm: shared memory transport self: process loopback transport (i.e., send to self; not send to others on the same host) openib: OpenFabrics transport > 4) How could I select the high-speed transport, say DAPL or OFED IB verbs ? > Is > there any preference as to the specific high-speed transport over QDR IB? openib is the preferred Open MPI plugin (the name is somewhat outdated, but it's modern OpenFabrics verbs -- see http://www.open-mpi.org/faq/?category=openfabrics#why-openib-name). > 5) When we launch MPI jobs via PBS/TORQUE do we have control on the task and > thread placement on nodes/cores ? Yes. Check out the man page for mpirun(1). > 6) Can we suspend/restart cleanly OMPI jobs with the above scheduler ? Any > caveats on suspension / resumption of OMPI jobs ? I'll let Josh handle this -- he's the checkpoint/restart guy. > 7) Do you have any performance data comparing OMPI vs say MVAPICVHv2 and > IntelMPI ? This is not a political issue since I am groing to be providing > all > these MPI stacks to our users. Heh; that's a loaded question no matter how you ask it. ;-) The truth is that every MPI will claim to be the greatest (you should see the marketing charts that Intel MPI puts out at the Sonoma OpenFabrics workshop every year!). We're all on-par with each other for all the major metrics. Some MPI's will choose to certain metrics that others choose not to optimize -- so you can always find a benchmark that shows "this MPI is great and the others suck!!" (which is what the marketing guys capitalize on). Each MPI has its benefits and drawbacks; we think Open MPI has great performance *and* a very large feature set that the other MPI's do not have. These are among the reasons we continue to develop and extend Open MPI. That's a non-answer way of saying that we don't really want to get in a benchmarks war here on a google-able mailing list. :-) It is probably best to do a little benchmarking yourself with apps that you know, understand, and control, in your environment. See what works best for you. Be sure / be careful to run apples-to-apples comparisons; if you're running optimized variants of MPI x, be sure to also run optimized variants of MPI y and z, too. And so on. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Program does not finish after MPI_Finalize()
Indeed, it's right. I work on a bigger program, but executions hanged most of the time. So I cut and cut and cut to finally obtain this. And it still hangs 2 times on 3 at least, and I don't know why. Le Monday 24 May 2010 14:48:43 Jeff Squyres, vous avez écrit : > Just to make sure I understand -- you're running the hello world app you > pasted in an earlier email with just 1 MPI process on the local machine, > and you're seeing hangs. Is that right? > > (there was a reference in a prior email to 2 different architectures -- > that's why I'm clarifying) > > On May 24, 2010, at 2:53 AM, Yves Caniou wrote: > > I rechecked, but didn't see anything wrong. > > Here is how I set my environment. Tkx. > > > > $>mpicc --v > > Using built-in specs. > > COLLECT_GCC=//home/p10015/gcc/bin/x86_64-unknown-linux-gnu-gcc-4.5.0 > > COLLECT_LTO_WRAPPER=/hsfs/home4/p10015/gcc/bin/../libexec/gcc/x86_64-unkn > >own-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu > > Configured > > with: ../gcc-4.5.0/configure --prefix=/home/p10015/gcc > > --with-gmp=/home/p10015/gmp --with-mpfr=/home/p10015/mpfr > > --with-mpc=/home/p10015/mpc --enable-lto --with-ppl=/home/p10015/ppl > > --with-libelf=/home/p10015/libelf --with-cloog=/home/p10015/cloog-ppl > > --enable-languages=c,c++,lto --disable-libada > > --enable-stage1-languages=c,c++,lto Thread model: posix > > gcc version 4.5.0 (GCC) > > > > $>mpiexec > > mpiexec (OpenRTE) 1.4.2 > > [cut] > > > > $>echo $LD_LIBRARY_PATH > > /home/p10015/gcc/lib64/:/home/p10015/openmpi/lib/:/home/p10015/omniORB/li > >b/:/home/p10015/omniORB/lib64/:/home/p10015/lib/:/home/p10015/lib64/::/usr > >/lib/:/usr/lib/xen/:/lib/: > > > > $>echo $PATH > > .:/home/p10015/gcc/bin/:/home/p10015/openmpi/bin/:/home/p10015/omniORB/bi > >n/:/home/p10015/git/bin/:/home/p10015/Bin/:/home/p10015/bin/:..:/usr/local > >/bin/:/opt/ofort90/bin:/opt/optc/bin:/opt/optscxx/bin:/opt/hitachi/nqs/bin > >:/opt/torque/bin:/opt/mpich-mx/bin:/usr/java/default/bin:/bin:/usr/bin:/sb > >in/:/usr/sbin/ > > > > $>echo $CPLUS_INCLUDE_PATH > > /home/p10015/gcc/include/c++/4.5.0/:/home/p10015/openmpi/include/:/home/p > >10015/omniORB/include/: > > > > $>echo $C_INCLUDE_PATH > > /home/p10015/gcc/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/include-fixed/:/h > >ome/p10015/gcc/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/include/:/home/p1001 > >5/openmpi/include/:/home/p10015/omniORB/include/: > > > > Le Monday 24 May 2010 08:35:17 Ralph Castain, vous avez écrit : > > > It looks to me like you are getting version confusion - your path and > > > ld_library_path aren't pointing to the place where you installed 1.4.1 > > > and you are either getting someone else's mpiexec or getting 1.2.x > > > instead. Could also be that mpicc isn't the one from 1.4.1 either. > > > > > > Check to ensure that the mpiexec and mpicc you are using are from > > > 1.4.1, and that your environment is pointing to the right place. > > > > > > On May 24, 2010, at 12:15 AM, Yves Caniou wrote: > > > > Dear All, > > > > (follows a previous mail) > > > > > > > > I don't understand the strange behavior of this small code: sometimes > > > > it ends, sometimes not. The output of MPI_Finalized is 1 (for each > > > > processes if n>1), but the code doesn't end. I am forced to use > > > > Ctrl-C. > > > > > > > > I compiled it with the command line: > > > > "mpicc --std=c99" / gcc is 4.5, on a Quad-Core AMD Opteron(tm) > > > > Processor 8356 "mpiexec -n 1 a.out" or "mpiexec -n 2 a.out" to run > > > > the code. > > > > "ps aux" returns that the program is in Sl+ state. > > > > > > > > Sometimes, I can see also a line like this: > > > > p100156892 0.1 0.0 43376 1828 ?Ssl 14:50 0:00 > > > > orted --hnp --set-sid --report-uri 8 --singleton-died-pipe 9 > > > > > > > > Is this a bug? Do I do something wrong? > > > > If you have any tips... > > > > Thank you. > > > > > > > > - > > > > #include "stdio.h" > > > > #include "mpi.h" > > > > > > > > int > > > > main(int argc, char *argv[]) > > > > { > > > > int my_num, mpi_size ; > > > > int flag ; > > > > > > > > MPI_Init(&argc, &argv) ; > > > > > > > > MPI_Comm_rank(MPI_COMM_WORLD, &my_num); > > > > printf("%d calls MPI_Finalize()\n\n\n", my_num) ; > > > > > > > > MPI_Finalize() ; > > > > > > > > MPI_Finalized(&flag) ; > > > > printf("MPI finalized: %d\n", flag) ; > > > > return 0 ; > > > > } > > > > --- > > > > > > > > -- > > > > Yves Caniou > > > > Associate Professor at Université Lyon 1, > > > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon, > > > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI), > > > > * in Information Technology Center, The University of Tokyo, > > > >2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan > > > >tel: +81-3-5841-0540 > > > > * in National Institute of Informatics > > > >2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan > > > >tel: +81-3-4212-2412 > > > > http://graal.ens-lyon.fr/~ycaniou/ > > > > > > > > _
[OMPI users] Process doesn't exit on remote machine when using hostfile
When I specify the hosts separately on the commandline, as follows, the process completes as expected. mpirun -np 8 -host remotehost,localhost myapp Output appears for the localhost and a textfile is created on the remotehost However when I use a hostfile the remote processes never complete. I can see the output from the local processes and by remote login I can see that that processes are being started on the remote machine but never complete. The is a simple reduce example using boost.mpi (v1.43) I'm using windows 7 x64 pro on both machines and openmpi 1.4.2 the hostfile and athe app are in the same locaion on both machines. Any idea why this is happening? Raj
Re: [OMPI users] Building 1.4.x on mac snow leopard with intel compilers
Yes, I'm sure I'm picking up the newly built version. I've run ompi_info to verify my path is correct. I've have a little more information now... I rebuilt openmpi 1.4.2 with the '--enable-debug' option to configure and when I run a simple mpi program on 2 processors with an MPI_Reduce() call: MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); I see: [macsierra:89600] *** An error occurred in MPI_Reduce: the reduction operation MPI_SUM is not defined on the MPI_DOUBLE datatype [macsierra:89600] *** on communicator MPI_COMM_WORLD [macsierra:89600] *** MPI_ERR_OP: invalid reduce operation [macsierra:89600] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) Thanks, Mike > On May 23, 2010, at 12:43 PM, Doug Reeder wrote: > Mike, > Are you sure that you are getting the openmpi that you built and not > the one supplied w/ OS X. I use modules to make sure that I am getting > the openmpi version I build instead of the OS X suppleid version. > Doug Reeder > On May 23, 2010, at 10:45 AM, Glass, Micheal W wrote: > > I'm having problems building a working version of openmpi 1.4.1/2 on > > a new Apple Mac Pro (dual quad-core nehalem processors) running snow > > leopard (10.6.3) with the Intel 11.1 compilers. I've tried the Intel > > 11.1.084 and 11.1.088 versions of the compilers. Everything appears > > to build just fine and some mpi test programs run but whenever I run > > a program with an MPI_Reduce() or MPI_Allreduce() I get a segfault > > (even with np=1). I'm building openmpi with: > > > > configure -without-xgrid -prefix= CC=icc CXX=icpc > > F77=ifort FC=ifort > > > > When I build openmpi 1.4.1/2 with the GNU 4.3 compilers (installed > > via macports) using: > > > > configure -without-xgrid -prefix= CC=gcc-mp-4.3 > > CXX=g++-mp-4.3 F77=gfortran-mp-4.3 FC=gfortran-mp-4.3 > > > > all my mpi tests (6000+) run fine. Any help would be appreciated. > > > > Thanks, > > Mike
Re: [OMPI users] mpirun: symbol lookup error: mpirun: undefined symbol:orte_xml_fp
Thanks a lot :) I've got one step further, but there are another problems. I think I've fixed that one with "undefined orte_xml_fm". I've uninstalled by "make uninstall" and cleaned with "make clean" and then configured with "--enable-mpirun-prefix-by-default" (like you said it) and "make all", "make install". You might have been right that I already had Open MPI with distro, because when i uninstalled it still could find files named "openmpi" and "mpirun" in /usr/bin . (if it's not okey to keep this topic alive, since now problem is, as I think, different, please tell me so :) ) I get other error when I want to run something. > So just to be clear -- if you mpirun a simple MPI test executable (e.g., the > test applications in the Open MPI examples/ directory), you get that error > message? E.g.: > > cd examples > make > mpirun -np 2 examples/hello_c > ...you see the missing symbol error here... See attachment "1" > What happens if you just run hello_c without mpirun? > > ./hello_c See attachment "2" Any ideas what is wrong? Just in case it would be helpful: > What's the output from "ldd hello_c"? (this tells us which libraries it's > linking to at run-time -- from your configure output, it should list > /usr/local/lib/libmpi.so in there somewhere) kretyn@kretyn-laptop ~/Pobrane/openmpi-1.4.2/examples $ ldd hello_c linux-vdso.so.1 => (0x7bdbe000) libmpi.so.0 => /usr/lib/libmpi.so.0 (0x7f5c7ba1e000) libopen-rte.so.0 => /usr/lib/libopen-rte.so.0 (0x7f5c7b7d6000) libopen-pal.so.0 => /usr/lib/libopen-pal.so.0 (0x7f5c7b563000) libdl.so.2 => /lib/libdl.so.2 (0x7f5c7b35f000) libnsl.so.1 => /lib/libnsl.so.1 (0x7f5c7b145000) libutil.so.1 => /lib/libutil.so.1 (0x7f5c7af42000) libm.so.6 => /lib/libm.so.6 (0x7f5c7acbe000) libpthread.so.0 => /lib/libpthread.so.0 (0x7f5c7aaa2000) libc.so.6 => /lib/libc.so.6 (0x7f5c7a733000) /lib64/ld-linux-x86-64.so.2 (0x7f5c7bcc7000) 1 Description: Binary data 2 Description: Binary data
Re: [OMPI users] mpirun: symbol lookup error: mpirun: undefinedsymbol:orte_xml_fp
On May 24, 2010, at 12:06 PM, Dawid Laszuk wrote: > > What's the output from "ldd hello_c"? (this tells us which libraries it's > > linking to at run-time -- from your configure output, it should list > > /usr/local/lib/libmpi.so in there somewhere) > > kretyn@kretyn-laptop ~/Pobrane/openmpi-1.4.2/examples $ ldd hello_c > linux-vdso.so.1 => (0x7bdbe000) > libmpi.so.0 => /usr/lib/libmpi.so.0 (0x7f5c7ba1e000) > libopen-rte.so.0 => /usr/lib/libopen-rte.so.0 (0x7f5c7b7d6000) > libopen-pal.so.0 => /usr/lib/libopen-pal.so.0 (0x7f5c7b563000) This seems to be the problem -- it's pointing to the "wrong" libmpi (and friends). Ensure that you're using /usr/local/bin/mpicc to compile your apps. Then you might also want to prefix the LD_LIBRARY_PATH environment variable with /usr/local/lib to ensure that you pick up your local Open MPI installation libraries (instead of the ones in /usr/lib). For example: export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH /usr/local/bin/mpicc hello_c.c -o hello_c -g /usr/local/bin/mpirun -np 4 hello_c Try that. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Building 1.4.x on mac snow leopard with intel compilers
On May 24, 2010, at 10:45 AM, Glass, Micheal W wrote: > Yes, I’m sure I’m picking up the newly built version. I’ve run ompi_info to > verify my path is correct. > > I’ve have a little more information now... I rebuilt openmpi 1.4.2 with the > ‘--enable-debug’ option to configure and when I run a simple mpi program on 2 > processors with an MPI_Reduce() call: > > MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); That's weird. I compiled on Snow Leopard (but with gcc) and it works fine for me. Open MPI definitely defines MPI_SUM on MPI_DOUBLE. I don't have the intel compiler to test with on Snow Leopard, unfortunately... It works fine for me with the intel suite 11.1.072 on Linux RHEL 5. I'm afraid I have no way to testing further -- is there any chance you can step through and see what is going on? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] Deadlock question
My MPI program consists of a number of processes that send 0 or more messages (using MPI_Isend) to 0 or more other processes. The processes check periodically if messages are available to be processed. It was running fine until I increased the message size, and I got deadlock problems. Googling learned I was running into a classic deadlock problem if (see for example http://www.cs.ucsb.edu/~hnielsen/cs140/mpi-deadlocks.html). The workarounds suggested like changing the order of MPI_Send and MPI_Recv do not work in my case, as it could be that one processor does not send any message at all to the other processes, so MPI_Recv would wait indefinitely. Any suggestions on how to avoid deadlock in this case? Thanks, Gijsbert
Re: [OMPI users] Deadlock question
Gijsbert Wiesenekker wrote: My MPI program consists of a number of processes that send 0 or more messages (using MPI_Isend) to 0 or more other processes. The processes check periodically if messages are available to be processed. It was running fine until I increased the message size, and I got deadlock problems. Googling learned I was running into a classic deadlock problem if (see for example http://www.cs.ucsb.edu/~hnielsen/cs140/mpi-deadlocks.html). The workarounds suggested like changing the order of MPI_Send and MPI_Recv do not work in my case, as it could be that one processor does not send any message at all to the other processes, so MPI_Recv would wait indefinitely. Any suggestions on how to avoid deadlock in this case? The problems you describe would seem to arise with blocking functions like MPI_Send and MPI_Recv. With the non-blocking variants MPI_Isend/MPI_Irecv, there shouldn't be this problem. There should be no requirement of ordering the functions in the way that web page describes... that workaround is suggested for the blocking calls. It feels to me that something is missing from your description. If you know the maximum size any message will be, you can post an MPI_Irecv with wild card tags and source ranks. You can post MPI_Isend calls for whatever messages you want to send. You can use MPI_Test to check if any message has been received; if so, process the received message and re-post the MPI_Irecv. You can use MPI_Test to check if any send messages have completed; if so, you can reuse those send buffers. You need some signal to indicate to processes that no further messages will be arriving.
Re: [OMPI users] mpirun: symbol lookup error: mpirun: undefinedsymbol:orte_xml_fp
That's it! It works. When I make export I don't have to even start from /usr/.../mpirun, plan "mpirun" do the work. Now I have to make that PATH to be like that all the time... hmm... Thanks a lot :) much appreciate it :) 2010/5/24 Jeff Squyres : > On May 24, 2010, at 12:06 PM, Dawid Laszuk wrote: > >> > What's the output from "ldd hello_c"? (this tells us which libraries it's >> > linking to at run-time -- from your configure output, it should list >> > /usr/local/lib/libmpi.so in there somewhere) >> >> kretyn@kretyn-laptop ~/Pobrane/openmpi-1.4.2/examples $ ldd hello_c >> linux-vdso.so.1 => (0x7bdbe000) >> libmpi.so.0 => /usr/lib/libmpi.so.0 (0x7f5c7ba1e000) >> libopen-rte.so.0 => /usr/lib/libopen-rte.so.0 (0x7f5c7b7d6000) >> libopen-pal.so.0 => /usr/lib/libopen-pal.so.0 (0x7f5c7b563000) > > This seems to be the problem -- it's pointing to the "wrong" libmpi (and > friends). > > Ensure that you're using /usr/local/bin/mpicc to compile your apps. Then you > might also want to prefix the LD_LIBRARY_PATH environment variable with > /usr/local/lib to ensure that you pick up your local Open MPI installation > libraries (instead of the ones in /usr/lib). > > For example: > > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > /usr/local/bin/mpicc hello_c.c -o hello_c -g > /usr/local/bin/mpirun -np 4 hello_c > > Try that. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Pozdrawiam, Dawid Laszuk
[OMPI users] Building from the SRPM version creates an rpm with striped libraries
I have a user who prefers building rpm's from the srpm. That's okay, but for debugging via TotalView it creates a version with the openmpi .so files stripped and we can't gain control of the processes when launched via mpirun -tv. I've verified this with my own build of a 1.4.1 rpm which I then installed and noticed the same behavior that the user reports. I was hoping to give them some advice as to how to avoid the stripping, as it appears that the actual build of those libraries is done with -g and everything looks fine. But I can't figure out in the build (from the log file I created) just where that stripping takes place, or how to get around it if need be. The best guess I have is that it may be happening at the very end when an rpm-tmp file is executed, but that file has disappeared so I don't really know what it does. I thought it might be apparent in the spec file, but it's certainly not apparent to me! Any help or advice would be appreciated. Cheers, Peter Thompson
[OMPI users] fork / exec from an MPI process
Our project is fork / exec'ing in some cases to provide a service for some of the processes within our MPI job. Open MPI spews big warnings to the terminal about this. It explains how to disable the message, but I'd really like it to not pop up regardless. The child process does not perform any MPI calls, or even access the network. In many cases, it probably doesn't even use sockets. Is there any way I could disable this message? Perhaps some special Open MPI code I could insert: #ifdef OPENMPI disable_fork_exec_warning(); #endif ? Thanks, -tom
Re: [OMPI users] fork / exec from an MPI process
Well, there are three easy ways to do this: 1. put OMPI_MCA_mpi_warn_on_fork=0 in your environ (you can even do that within your code prior to calling MPI_Init) 2. put mpi_warn_on_fork=0 in your default MCA param file 3. add -mca mpi_warn_on_fork 0 to your mpirun cmd line On May 24, 2010, at 6:33 PM, tom fogal wrote: > Our project is fork / exec'ing in some cases to provide a service for > some of the processes within our MPI job. Open MPI spews big warnings > to the terminal about this. It explains how to disable the message, > but I'd really like it to not pop up regardless. > > The child process does not perform any MPI calls, or even access the > network. In many cases, it probably doesn't even use sockets. > > Is there any way I could disable this message? Perhaps some special > Open MPI code I could insert: > > #ifdef OPENMPI >disable_fork_exec_warning(); > #endif > > ? > > Thanks, > > -tom > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users