Re: [OMPI users] hostfiles

2010-02-06 Thread Eugene Loh




Jeff Squyres wrote:

  On Feb 4, 2010, at 7:55 PM, Ralph Castain wrote:
  
  
Take a look at orte/mca/rmaps/seq - you can select it with -mca rmaps seq

I believe it is documented

  

I don't know where.

  
...if it isn't, can it be added to the man page?  It might be a common mpirun / hostfile question...?

  

I just added it to the mpirun man page.  r22567




Re: [OMPI users] Trapping fortran I/O errorsleavingzombiempiprocesses

2010-02-06 Thread Laurence Marks
I managed to find time to reproduce the issue, although it is not very
reproducible in it's results and I suspect it may not be easy to
reproduce with a simple code plus I've never actually constructed a
mpi code so (I am cc'ing Michael Sternberg who compiled the
openmpi in case there are flags to add to the compilation.)

I have 8 processes on a single dual quadcore reading from the same
file using formatted fortran I/O. I deliberately created an error in
the read. If this error is a format error, all the processes
terminate. If the error is because there is not enough data (EOF), I
get somewhere from 1 to 7 zombie's. They don't seem to be doing
anything (top -ulmarks shows no CPU activity) but I have no idea if
they have locks on the file or anything else (I think they might, but
have no idea how to tell).

On Fri, Jan 29, 2010 at 6:18 PM, Jeff Squyres  wrote:
> On Jan 29, 2010, at 9:13 AM, Laurence Marks wrote:
>
>> OK, but trivial codes don't always reproduce problems.
>
> Yes, but if the problem is a file reading beyond the end, that should be 
> fairly isolated behavior.
>
>> Is strace useful?
>
> Sure.  But let's check to see if the apps are actually dying or hanging first.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.



Re: [OMPI users] libtool compile error

2010-02-06 Thread Caciano Machado
Hi,

You can solve this installing libtool 2.2.6b and running autogen.sh.

Regards,
Caciano Machado

On Thu, Feb 4, 2010 at 8:25 PM, Peter C. Lichtner  wrote:
> I'm trying to compile openmpi-1.4.1 on MacOSX 10.5.8 using Absoft Fortran
> 90 11.0 and gcc --version i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple
> Inc. build 5493). I get the following error:
>
> make
> ...
>
> Making all in mca/io/romio
> Making all in romio
> Making all in include
> make[4]: Nothing to be done for `all'.
> Making all in adio
> Making all in common
> /bin/sh ../../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I.
> -I../../adio/include  -DOMPI_BUILDING=1
> -I/Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/../../../../..
> -I/Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/../../../../../opal/include
> -I../../../../../../../opal/include -I../../../../../../../ompi/include
> -I/Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/include
> -I/Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/adio/include
> -D_REENTRANT  -O3 -DNDEBUG -finline-functions -fno-strict-aliasing
> -DHAVE_ROMIOCONF_H -DHAVE_ROMIOCONF_H  -I../../include -MT ad_aggregate.lo
> -MD -MP -MF .deps/ad_aggregate.Tpo -c -o ad_aggregate.lo ad_aggregate.c
> ../../libtool: line 460: CDPATH: command not found
> /Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/libtool: line
> 460: CDPATH: command not found
> /Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/libtool: line
> 1138: func_opt_split: command not found
> libtool: Version mismatch error.  This is libtool 2.2.6b, but the
> libtool: definition of this LT_INIT comes from an older release.
> libtool: You should recreate aclocal.m4 with macros from libtool 2.2.6b
> libtool: and run autoconf again.
> make[5]: *** [ad_aggregate.lo] Error 63
> make[4]: *** [all-recursive] Error 1
> make[3]: *** [all-recursive] Error 1
> make[2]: *** [all-recursive] Error 1
> make[1]: *** [all-recursive] Error 1
> make: *** [all-recursive] Error 1
>
> Any help appreciated.
> ...Peter
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] [mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.

2010-02-06 Thread Jed Brown
On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith  wrote:
> To cheer you up, when I run with openMPI it runs forever sucking down  
> 100% CPU trying to send the messages :-)

On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete
after several seconds, but still prints the wrong count.

MPICH2 does not actually send the message, as you can see by running the
attached code.

  # Open MPI 1.4.1, correct cols[0]
  [0] sending...
  [1] receiving...
  count -103432106, cols[0] 0

  # MPICH2 1.2.1, incorrect cols[1]
  [1] receiving...
  [0] sending...
  [1] count -103432106, cols[0] 1


How much memory does crush have (you need about 7GB to do this without
swapping)?  In particular, most of the time it took Open MPI to send the
message (with your source) was actually just spent faulting the
send/recv buffers.  The attached faults the buffers first, and the
subsequent send/recv takes less than 2 seconds.

Actually, it's clear that MPICH2 never touches either buffer because it
returns immediately regardless of whether they have been faulted first.

Jed

#include 
#include 
#include 

int main(int argc,char **argv)
{
  intierr,i,size,rank;
  intcnt = 433438806;
  MPI_Status status;
  long long  *cols;

  MPI_Init(&argc,&argv);
  ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);
  ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  if (size != 2) {
fprintf(stderr,"[%d] usage: mpiexec -n 2 %s\n",rank,argv[0]);
MPI_Abort(MPI_COMM_WORLD,1);
  }

  cols = malloc(cnt*sizeof(long long));
  for (i=0; i

Re: [OMPI users] Trapping fortran I/O errorsleavingzombiempiprocesses

2010-02-06 Thread Laurence Marks
The following code reproduces the problem for mpif90/ifort
11.1/openmpi-1.4.1. With an empty test.input (touch test.input) some
not reproducible number of zombies processes are created.

include "mpif.h"
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, irank, ierr)
open (unit=10,file='test.input')
if(irank.lt.3)then
read(10,1,err=20)ii
else
read(10,1)ii
endif
20  write(6,*)irank,ii
1   format(i4)
call MPI_FINALIZE(ierr)
end

N.B., if I deliberately create a format error for the read no zombies remain.

-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.