Re: [O-MPI users] problem running Migrate with open-MPI

2006-02-08 Thread Peter Beerli

Dear Brian,

The original poster intended to run migrate-n in parallel mode, but the
stdout fragment shows that the program was compiled for a non-MPI  
architecture
(either single CPU or SMP pthreads) [I talked with him list-offline  
and it used pthreads].
A version for parallel runs shows this fact in its first couple of  
lines, like this (<):

  =
  MIGRATION RATE AND POPULATION SIZE ESTIMATION
  using Markov Chain Monte Carlo simulation
  =
  compiled for a PARALLEL COMPUTER ARCHITECTURE
<@

  Version debug 2.1.3   [x]
  Program started at   Wed Feb  8 12:29:35 2006

As far as I am concerned migrate-n compiles and runs on openmpi  
1.0.1. There might be some use in running
the program multiple times completely independently through openmpi  
or lam for simulation purposes, but
that would not be a typical use of the program that can distribute  
multiple genetic loci on multiple nodes and only having
the master handling input and output (when compiled using configure;  
make mpis or configure;make mpi)



Peter

Peter Beerli,
Computational Evolutionary Biology Group
School of Computational Science (SCS)
and Biological Sciences Department
150-T Dirac Science Library
Florida State University
Tallahassee, Florida 32306-4120 USA
Webpage: http://www.csit.fsu.edu/~beerli
Phone: 850.645.1324
Fax: 850.644.0094





On Feb 8, 2006, at 11:24 AM, Brian Barrett wrote:


I think we fixed this over this last weekend.  I believe the problem
was our mis-handling of standard input in some cases. I believe I was
able to get the application running (but I could be fooling myself
there...).  Could you download the latest nightly build from the URL
below and see if it works for you?  The fixes are scheduled to be
part of Open MPI 1.0.2, which should be out real soon now.

 http://www.open-mpi.org/nightly/trunk/

Thanks,

Brian

On Feb 3, 2006, at 10:23 AM, Andy Vierstraete wrote:


Hi,

I have installed Migrate  2.1.2, but it fails to run on open-MPI (it
does run on LAM-MPI : see end of mail)

my system is Suse 10 on Athlon X2

hostfile : localhost slots=2 max_slots=2

I tried different commands :

1. does not start : error message :
**

avierstr@muscorum:~/thomas> mpiexec  -np 2 migrate-mpi
mpiexec noticed that job rank 1 with PID 0 on node "localhost"
exited on
signal 11.
[muscorum:07212] ERROR: A daemon on node localhost failed to start as
expected.
[muscorum:07212] ERROR: There may be more information available from
[muscorum:07212] ERROR: the remote shell (see above).
[muscorum:07212] The daemon received a signal 11.
1 additional process aborted (not shown)



2. starts a non-ending loop :


avierstr@muscorum:~/thomas> mpirun -np 2 --hostfile ./hostfile
migrate-mpi
migrate-mpi
  =
  MIGRATION RATE AND POPULATION SIZE ESTIMATION
  using Markov Chain Monte Carlo simulation
  =
  Version 2.1.2
  Program started at   Fri Feb  3 15:58:57 2006


  Settings for this run:
  D   Data type currently set to: DNA sequence model
  I   Input/Output formats
  P   Parameters  [start, migration model]
  S   Search strategy
  W   Write a parmfile
  Q   Quit the program


  Are the settings correct?
  (Type Y or the letter for one to change)
  Settings for this run:
  D   Data type currently set to: DNA sequence model
  I   Input/Output formats
  P   Parameters  [start, migration model]
  S   Search strategy
  W   Write a parmfile
  Q   Quit the program


  Are the settings correct?
  (Type Y or the letter for one to change)
  Settings for this run:
  D   Data type currently set to: DNA sequence model
  I   Input/Output formats
  P   Parameters  [start, migration model]
  S   Search strategy
  W   Write a parmfile
  Q   Quit the program


  Are the settings correct?
  (Type Y or the letter for one to change)
  Settings for this run:
  D   Data type currently set to: DNA sequence model
  I   Input/Output formats
  P   Parameters  [start, migration model]
  S   Search strategy
  W   Write a parmfile
  Q   Quit the program


  Are the settings correct?
  (Type Y or the letter for one to change)
  Settings for this run:
  D   Data type currently set to: DNA sequence model
  I   Input/Output formats
  P   Parameters  [start, migration model]
  S   Search strategy
  W   Write a parmfile
  Q   Quit the program


  Are the settings correct?
  (Type Y or the letter for one to change)
igration model]
  S   Search strategy
  W   Write a parmfile
  Q   Quit the program


  Are the settings

Re: [OMPI users] problem running Migrate with open-MPI

2006-02-13 Thread Peter Beerli

Dear Andy,

you wrote that with openmpi:

avierstr@muscorum:~> mpiexec --hostfile hostfile -np 1  migrate-n



it does not work, but with lam-mpi

avierstr@muscorum:~/thomas> mpiexec -np 2  migrate-n


you started openmpi on only _one_ node, migrate needs at least _two_  
nodes to run

(as you did in lam-mpi)
migrate actually aborts when running on only one node, it should show  
an error message so, like this


zork>which mpirun
/usr/local/openmpi/bin/mpirun
zork>mpirun -machinefile ~/onehost -np 1 migrate-n
migrate-n
  =
  MIGRATION RATE AND POPULATION SIZE ESTIMATION
  using Markov Chain Monte Carlo simulation
  =
  compiled for a PARALLEL COMPUTER ARCHITECTURE
  Version debug 2.1.3   [x]
  Program started at   Mon Feb 13 09:03:45 2006



Reading N ...
Reading S ...

In file main.c on line 697
This program was compiled to use a parallel computer
and you tried to run it on only a single node.
This will not work because it uses a
"single_master-many_worker" architecture
and needs at least TWO nodes


Peter



[OMPI users] program stalls in __write_nocancel()

2008-11-05 Thread Peter Beerli

On some of my larger problems ,
my program stalls and does not continue

(50 or more nodes, 'long' runs >5 hours). My program is set up as a  
master-worker
and it seems that the master gets stuck in a write to stdout see gdb  
backtrace below (It took all day
to get there on 50 nodes). the function handle_message is simply  
printing to the stdout in this case.
Of course the workers keep sending stuff to the master, but the master  
is stuck

writing that does not finish. Any idea where to look next?
[smaller runs look fine, valgrind did not find problems in my code  
(complaining a lot about openmpi so)

I attach also the ompi_info to show versions (OS is macos 10.5.5)
any idea what is going on? [any hint is welcome!]

thanks
Peter

(gdb) bt
#0  0x0037528c0e50 in __write_nocancel () from /lib64/libc.so.6
#1  0x0037528694b3 in _IO_new_file_write () from /lib64/libc.so.6
#2  0x0037528693c6 in _IO_new_do_write () from /lib64/libc.so.6
#3  0x00375286a822 in _IO_new_file_xsputn () from /lib64/libc.so.6
#4  0x00375285f4f8 in fputs () from /lib64/libc.so.6
#5  0x0045e9de in handle_message (
   rawmessage=0x4bb8830 "M0:[ 12] Swapping between 4 temperatures. 
\n", ' ' , "Temperature | Accepted |  Swaps between  
temperatures\n", ' ' , "1e+06  |   0.00   |   
|\n", ' ' , "3.  |   0.08   |1 ||"...,  
sender=12, world=0x448d8b0)

   at migrate_mpi.c:3663
#6  0x0045362a in mpi_runloci_master (loci=1, who=0x4541fc0,
   world=0x448d8b0, options_readsum=0, menu=0) at migrate_mpi.c:228
#7  0x0044ed86 in run_sampler (options=0x448dc20,  
data=0x4465a10,

   universe=0x42b90c0, usize=4, outfilepos=0x7fff0ff98ee0,
   Gmax=0x7fff0ff98ee8) at main.c:885
#8  0x0044dff2 in main (argc=3, argv=0x7fff0ff99008) at main.c: 
422



petal:~>ompi_info
   Open MPI: 1.2.8
  Open MPI SVN revision: r19718
   Open RTE: 1.2.8
  Open RTE SVN revision: r19718
   OPAL: 1.2.8
  OPAL SVN revision: r19718
 Prefix: /home/beerli/openmpi
Configured architecture: x86_64-unknown-linux-gnu
  Configured by: beerli
  Configured on: Mon Nov  3 15:00:02 EST 2008
 Configure host: petal
   Built by: beerli
   Built on: Mon Nov  3 15:08:02 EST 2008
 Built host: petal
 C bindings: yes
   C++ bindings: yes
 Fortran77 bindings: yes (all)
 Fortran90 bindings: yes
Fortran90 bindings size: small
 C compiler: gcc
C compiler absolute: /usr/bin/gcc
   C++ compiler: g++
  C++ compiler absolute: /usr/bin/g++
 Fortran77 compiler: gfortran
 Fortran77 compiler abs: /usr/bin/gfortran
 Fortran90 compiler: gfortran
 Fortran90 compiler abs: /usr/bin/gfortran
C profiling: yes
  C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
 C++ exceptions: no
 Thread support: posix (mpi: no, progress: no)
 Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
  Heterogeneous support: yes
mpirun default --prefix: no
  MCA backtrace: execinfo (MCA v1.0, API v1.0, Component  
v1.2.8)
 MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component  
v1.2.8)

  MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)
  MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
v1.2.8)

  MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.8)
MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.8)
MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.8)
  MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
  MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
   MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.8)
   MCA coll: self (MCA v1.0, API v1.0, Component v1.2.8)
   MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.8)
   MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.8)
 MCA io: romio (MCA v1.0, API v1.0, Component v1.2.8)
  MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.8)
  MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.8)
MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.8)
 MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.8)
MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.8)
MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.8)
MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
   MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.8)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.8)
 MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.

Re: [OMPI users] Problem compiling OMPI with Intel C compiler on Mac OS X

2006-08-17 Thread Peter Beerli

Today I ran into the same problem as Warner Yuen (see thread below),

openmpi does not compile with icc and fails with an error where  
libtool ask for --tag. The error is macosx specific.


It occurs in the compile for xgrid

in

openmpi-1.1/orte/mca/pls/xgrid

the Makefile fails; xgrid uses some objective-C stuff that needs to  
be compiled with gcc [I guess]


after adjusting the Makefile.in

from

xgrid>grep -n "\-\-tag=OBJC" Makefile.in
216:LTOBJCCOMPILE = $(LIBTOOL)  --mode=compile $(OBJC) $(DEFS) \
220:OBJCLINK = $(LIBTOOL)  --mode=link $(OBJCLD) $(AM_OBJCFLAGS) \
to

xgrid>grep -n "\-\-tag=OBJC" Makefile.in
216:LTOBJCCOMPILE = $(LIBTOOL) --tag=OBJC --mode=compile $(OBJC) $ 
(DEFS) \
220:OBJCLINK = $(LIBTOOL)  --tag=OBJC --mode=link $(OBJCLD) $ 
(AM_OBJCFLAGS)


the change elicits a warning that OBJC is not a known tag, but it  
keeps going and compiles fine.
I do not use the xgrid portion so I do not now whether this is  
clobbered or not. Standard runs using orterun

work fine.


Peter

Brian Barrett wrote in July:

On Jul 14, 2006, at 10:35 AM, Warner Yuen wrote:

> I'm having trouble compiling Open MPI with Mac OS X v10.4.6 with
> the Intel C compiler. Here are some details:
>
> 1) I upgraded to the latest versions of Xcode including GCC 4.0.1
> build 5341.
> 2) I installed the latest Intel update (9.1.027) as well.
> 3) Open MPI compiles fine with using GCC and IFORT.
> 4) Open MPI fails with ICC and IFORT
> 5) MPICH-2.1.0.3 compiles fine with ICC and IFORT (I just had to
> find out if my compiler worked...sorry!)
> 6) My Open MPI confguration was using: ./configure --with-rsh=/usr/
> bin/ssh --prefix=/usr/local/ompi11icc
> 7) Should I have included my config.log?

It looks like there are some problems with GNU libtool's support for
the Intel compiler on OS X. I can't tell if it's a problem with the
Intel compiler or libtool. A quick fix is to build Open MPI with
static libraries rather than shared libraries. You can do this by
adding:

   --disable-shared --enable-static

to the configure line for Open MPI (if you're building in the same
directory where you've already run configure, you want to run make
clean before building again).

I unfortunately don't have access to a Intel Mac machines with the
Intel compilers installed, so I can't verify this issue. I believe
one of the other developers does have such a configuration, so I'll
ask him when he's available (might be a week or two -- I believe he's
on vacation). This issue seems to be unique to your exact
configuration -- it doesn't happen with GCC on the Intel Mac nor on
Linux with the Intel compilers.



Brian



--
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/