Re: [O-MPI users] problem running Migrate with open-MPI
Dear Brian, The original poster intended to run migrate-n in parallel mode, but the stdout fragment shows that the program was compiled for a non-MPI architecture (either single CPU or SMP pthreads) [I talked with him list-offline and it used pthreads]. A version for parallel runs shows this fact in its first couple of lines, like this (<): = MIGRATION RATE AND POPULATION SIZE ESTIMATION using Markov Chain Monte Carlo simulation = compiled for a PARALLEL COMPUTER ARCHITECTURE <@ Version debug 2.1.3 [x] Program started at Wed Feb 8 12:29:35 2006 As far as I am concerned migrate-n compiles and runs on openmpi 1.0.1. There might be some use in running the program multiple times completely independently through openmpi or lam for simulation purposes, but that would not be a typical use of the program that can distribute multiple genetic loci on multiple nodes and only having the master handling input and output (when compiled using configure; make mpis or configure;make mpi) Peter Peter Beerli, Computational Evolutionary Biology Group School of Computational Science (SCS) and Biological Sciences Department 150-T Dirac Science Library Florida State University Tallahassee, Florida 32306-4120 USA Webpage: http://www.csit.fsu.edu/~beerli Phone: 850.645.1324 Fax: 850.644.0094 On Feb 8, 2006, at 11:24 AM, Brian Barrett wrote: I think we fixed this over this last weekend. I believe the problem was our mis-handling of standard input in some cases. I believe I was able to get the application running (but I could be fooling myself there...). Could you download the latest nightly build from the URL below and see if it works for you? The fixes are scheduled to be part of Open MPI 1.0.2, which should be out real soon now. http://www.open-mpi.org/nightly/trunk/ Thanks, Brian On Feb 3, 2006, at 10:23 AM, Andy Vierstraete wrote: Hi, I have installed Migrate 2.1.2, but it fails to run on open-MPI (it does run on LAM-MPI : see end of mail) my system is Suse 10 on Athlon X2 hostfile : localhost slots=2 max_slots=2 I tried different commands : 1. does not start : error message : ** avierstr@muscorum:~/thomas> mpiexec -np 2 migrate-mpi mpiexec noticed that job rank 1 with PID 0 on node "localhost" exited on signal 11. [muscorum:07212] ERROR: A daemon on node localhost failed to start as expected. [muscorum:07212] ERROR: There may be more information available from [muscorum:07212] ERROR: the remote shell (see above). [muscorum:07212] The daemon received a signal 11. 1 additional process aborted (not shown) 2. starts a non-ending loop : avierstr@muscorum:~/thomas> mpirun -np 2 --hostfile ./hostfile migrate-mpi migrate-mpi = MIGRATION RATE AND POPULATION SIZE ESTIMATION using Markov Chain Monte Carlo simulation = Version 2.1.2 Program started at Fri Feb 3 15:58:57 2006 Settings for this run: D Data type currently set to: DNA sequence model I Input/Output formats P Parameters [start, migration model] S Search strategy W Write a parmfile Q Quit the program Are the settings correct? (Type Y or the letter for one to change) Settings for this run: D Data type currently set to: DNA sequence model I Input/Output formats P Parameters [start, migration model] S Search strategy W Write a parmfile Q Quit the program Are the settings correct? (Type Y or the letter for one to change) Settings for this run: D Data type currently set to: DNA sequence model I Input/Output formats P Parameters [start, migration model] S Search strategy W Write a parmfile Q Quit the program Are the settings correct? (Type Y or the letter for one to change) Settings for this run: D Data type currently set to: DNA sequence model I Input/Output formats P Parameters [start, migration model] S Search strategy W Write a parmfile Q Quit the program Are the settings correct? (Type Y or the letter for one to change) Settings for this run: D Data type currently set to: DNA sequence model I Input/Output formats P Parameters [start, migration model] S Search strategy W Write a parmfile Q Quit the program Are the settings correct? (Type Y or the letter for one to change) igration model] S Search strategy W Write a parmfile Q Quit the program Are the settings
Re: [OMPI users] problem running Migrate with open-MPI
Dear Andy, you wrote that with openmpi: avierstr@muscorum:~> mpiexec --hostfile hostfile -np 1 migrate-n it does not work, but with lam-mpi avierstr@muscorum:~/thomas> mpiexec -np 2 migrate-n you started openmpi on only _one_ node, migrate needs at least _two_ nodes to run (as you did in lam-mpi) migrate actually aborts when running on only one node, it should show an error message so, like this zork>which mpirun /usr/local/openmpi/bin/mpirun zork>mpirun -machinefile ~/onehost -np 1 migrate-n migrate-n = MIGRATION RATE AND POPULATION SIZE ESTIMATION using Markov Chain Monte Carlo simulation = compiled for a PARALLEL COMPUTER ARCHITECTURE Version debug 2.1.3 [x] Program started at Mon Feb 13 09:03:45 2006 Reading N ... Reading S ... In file main.c on line 697 This program was compiled to use a parallel computer and you tried to run it on only a single node. This will not work because it uses a "single_master-many_worker" architecture and needs at least TWO nodes Peter
[OMPI users] program stalls in __write_nocancel()
On some of my larger problems , my program stalls and does not continue (50 or more nodes, 'long' runs >5 hours). My program is set up as a master-worker and it seems that the master gets stuck in a write to stdout see gdb backtrace below (It took all day to get there on 50 nodes). the function handle_message is simply printing to the stdout in this case. Of course the workers keep sending stuff to the master, but the master is stuck writing that does not finish. Any idea where to look next? [smaller runs look fine, valgrind did not find problems in my code (complaining a lot about openmpi so) I attach also the ompi_info to show versions (OS is macos 10.5.5) any idea what is going on? [any hint is welcome!] thanks Peter (gdb) bt #0 0x0037528c0e50 in __write_nocancel () from /lib64/libc.so.6 #1 0x0037528694b3 in _IO_new_file_write () from /lib64/libc.so.6 #2 0x0037528693c6 in _IO_new_do_write () from /lib64/libc.so.6 #3 0x00375286a822 in _IO_new_file_xsputn () from /lib64/libc.so.6 #4 0x00375285f4f8 in fputs () from /lib64/libc.so.6 #5 0x0045e9de in handle_message ( rawmessage=0x4bb8830 "M0:[ 12] Swapping between 4 temperatures. \n", ' ' , "Temperature | Accepted | Swaps between temperatures\n", ' ' , "1e+06 | 0.00 | |\n", ' ' , "3. | 0.08 |1 ||"..., sender=12, world=0x448d8b0) at migrate_mpi.c:3663 #6 0x0045362a in mpi_runloci_master (loci=1, who=0x4541fc0, world=0x448d8b0, options_readsum=0, menu=0) at migrate_mpi.c:228 #7 0x0044ed86 in run_sampler (options=0x448dc20, data=0x4465a10, universe=0x42b90c0, usize=4, outfilepos=0x7fff0ff98ee0, Gmax=0x7fff0ff98ee8) at main.c:885 #8 0x0044dff2 in main (argc=3, argv=0x7fff0ff99008) at main.c: 422 petal:~>ompi_info Open MPI: 1.2.8 Open MPI SVN revision: r19718 Open RTE: 1.2.8 Open RTE SVN revision: r19718 OPAL: 1.2.8 OPAL SVN revision: r19718 Prefix: /home/beerli/openmpi Configured architecture: x86_64-unknown-linux-gnu Configured by: beerli Configured on: Mon Nov 3 15:00:02 EST 2008 Configure host: petal Built by: beerli Built on: Mon Nov 3 15:08:02 EST 2008 Built host: petal C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: no MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.8) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.8) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.8) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.8) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.8) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.8) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.8) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.8) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.8) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.8) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.8) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.8) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.8) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.8) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.8) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.8) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.8) MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.8) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.8) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.
Re: [OMPI users] Problem compiling OMPI with Intel C compiler on Mac OS X
Today I ran into the same problem as Warner Yuen (see thread below), openmpi does not compile with icc and fails with an error where libtool ask for --tag. The error is macosx specific. It occurs in the compile for xgrid in openmpi-1.1/orte/mca/pls/xgrid the Makefile fails; xgrid uses some objective-C stuff that needs to be compiled with gcc [I guess] after adjusting the Makefile.in from xgrid>grep -n "\-\-tag=OBJC" Makefile.in 216:LTOBJCCOMPILE = $(LIBTOOL) --mode=compile $(OBJC) $(DEFS) \ 220:OBJCLINK = $(LIBTOOL) --mode=link $(OBJCLD) $(AM_OBJCFLAGS) \ to xgrid>grep -n "\-\-tag=OBJC" Makefile.in 216:LTOBJCCOMPILE = $(LIBTOOL) --tag=OBJC --mode=compile $(OBJC) $ (DEFS) \ 220:OBJCLINK = $(LIBTOOL) --tag=OBJC --mode=link $(OBJCLD) $ (AM_OBJCFLAGS) the change elicits a warning that OBJC is not a known tag, but it keeps going and compiles fine. I do not use the xgrid portion so I do not now whether this is clobbered or not. Standard runs using orterun work fine. Peter Brian Barrett wrote in July: On Jul 14, 2006, at 10:35 AM, Warner Yuen wrote: > I'm having trouble compiling Open MPI with Mac OS X v10.4.6 with > the Intel C compiler. Here are some details: > > 1) I upgraded to the latest versions of Xcode including GCC 4.0.1 > build 5341. > 2) I installed the latest Intel update (9.1.027) as well. > 3) Open MPI compiles fine with using GCC and IFORT. > 4) Open MPI fails with ICC and IFORT > 5) MPICH-2.1.0.3 compiles fine with ICC and IFORT (I just had to > find out if my compiler worked...sorry!) > 6) My Open MPI confguration was using: ./configure --with-rsh=/usr/ > bin/ssh --prefix=/usr/local/ompi11icc > 7) Should I have included my config.log? It looks like there are some problems with GNU libtool's support for the Intel compiler on OS X. I can't tell if it's a problem with the Intel compiler or libtool. A quick fix is to build Open MPI with static libraries rather than shared libraries. You can do this by adding: --disable-shared --enable-static to the configure line for Open MPI (if you're building in the same directory where you've already run configure, you want to run make clean before building again). I unfortunately don't have access to a Intel Mac machines with the Intel compilers installed, so I can't verify this issue. I believe one of the other developers does have such a configuration, so I'll ask him when he's available (might be a week or two -- I believe he's on vacation). This issue seems to be unique to your exact configuration -- it doesn't happen with GCC on the Intel Mac nor on Linux with the Intel compilers. Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/