Re: [OMPI users] hostfiles
Jeff Squyres wrote: On Feb 4, 2010, at 7:55 PM, Ralph Castain wrote: Take a look at orte/mca/rmaps/seq - you can select it with -mca rmaps seq I believe it is documented I don't know where. ...if it isn't, can it be added to the man page? It might be a common mpirun / hostfile question...? I just added it to the mpirun man page. r22567
Re: [OMPI users] Trapping fortran I/O errorsleavingzombiempiprocesses
I managed to find time to reproduce the issue, although it is not very reproducible in it's results and I suspect it may not be easy to reproduce with a simple code plus I've never actually constructed a mpi code so (I am cc'ing Michael Sternberg who compiled the openmpi in case there are flags to add to the compilation.) I have 8 processes on a single dual quadcore reading from the same file using formatted fortran I/O. I deliberately created an error in the read. If this error is a format error, all the processes terminate. If the error is because there is not enough data (EOF), I get somewhere from 1 to 7 zombie's. They don't seem to be doing anything (top -ulmarks shows no CPU activity) but I have no idea if they have locks on the file or anything else (I think they might, but have no idea how to tell). On Fri, Jan 29, 2010 at 6:18 PM, Jeff Squyres wrote: > On Jan 29, 2010, at 9:13 AM, Laurence Marks wrote: > >> OK, but trivial codes don't always reproduce problems. > > Yes, but if the problem is a file reading beyond the end, that should be > fairly isolated behavior. > >> Is strace useful? > > Sure. But let's check to see if the apps are actually dying or hanging first. > > -- > Jeff Squyres > jsquy...@cisco.com > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering and imaging to study the structure of matter.
Re: [OMPI users] libtool compile error
Hi, You can solve this installing libtool 2.2.6b and running autogen.sh. Regards, Caciano Machado On Thu, Feb 4, 2010 at 8:25 PM, Peter C. Lichtner wrote: > I'm trying to compile openmpi-1.4.1 on MacOSX 10.5.8 using Absoft Fortran > 90 11.0 and gcc --version i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple > Inc. build 5493). I get the following error: > > make > ... > > Making all in mca/io/romio > Making all in romio > Making all in include > make[4]: Nothing to be done for `all'. > Making all in adio > Making all in common > /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. > -I../../adio/include -DOMPI_BUILDING=1 > -I/Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/../../../../.. > -I/Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/../../../../../opal/include > -I../../../../../../../opal/include -I../../../../../../../ompi/include > -I/Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/include > -I/Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/adio/include > -D_REENTRANT -O3 -DNDEBUG -finline-functions -fno-strict-aliasing > -DHAVE_ROMIOCONF_H -DHAVE_ROMIOCONF_H -I../../include -MT ad_aggregate.lo > -MD -MP -MF .deps/ad_aggregate.Tpo -c -o ad_aggregate.lo ad_aggregate.c > ../../libtool: line 460: CDPATH: command not found > /Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/libtool: line > 460: CDPATH: command not found > /Users/lichtner/petsc/openmpi-1.4.1/ompi/mca/io/romio/romio/libtool: line > 1138: func_opt_split: command not found > libtool: Version mismatch error. This is libtool 2.2.6b, but the > libtool: definition of this LT_INIT comes from an older release. > libtool: You should recreate aclocal.m4 with macros from libtool 2.2.6b > libtool: and run autoconf again. > make[5]: *** [ad_aggregate.lo] Error 63 > make[4]: *** [all-recursive] Error 1 > make[3]: *** [all-recursive] Error 1 > make[2]: *** [all-recursive] Error 1 > make[1]: *** [all-recursive] Error 1 > make: *** [all-recursive] Error 1 > > Any help appreciated. > ...Peter > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] [mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.
On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith wrote: > To cheer you up, when I run with openMPI it runs forever sucking down > 100% CPU trying to send the messages :-) On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete after several seconds, but still prints the wrong count. MPICH2 does not actually send the message, as you can see by running the attached code. # Open MPI 1.4.1, correct cols[0] [0] sending... [1] receiving... count -103432106, cols[0] 0 # MPICH2 1.2.1, incorrect cols[1] [1] receiving... [0] sending... [1] count -103432106, cols[0] 1 How much memory does crush have (you need about 7GB to do this without swapping)? In particular, most of the time it took Open MPI to send the message (with your source) was actually just spent faulting the send/recv buffers. The attached faults the buffers first, and the subsequent send/recv takes less than 2 seconds. Actually, it's clear that MPICH2 never touches either buffer because it returns immediately regardless of whether they have been faulted first. Jed #include #include #include int main(int argc,char **argv) { intierr,i,size,rank; intcnt = 433438806; MPI_Status status; long long *cols; MPI_Init(&argc,&argv); ierr = MPI_Comm_size(MPI_COMM_WORLD,&size); ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank); if (size != 2) { fprintf(stderr,"[%d] usage: mpiexec -n 2 %s\n",rank,argv[0]); MPI_Abort(MPI_COMM_WORLD,1); } cols = malloc(cnt*sizeof(long long)); for (i=0; i
Re: [OMPI users] Trapping fortran I/O errorsleavingzombiempiprocesses
The following code reproduces the problem for mpif90/ifort 11.1/openmpi-1.4.1. With an empty test.input (touch test.input) some not reproducible number of zombies processes are created. include "mpif.h" call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, irank, ierr) open (unit=10,file='test.input') if(irank.lt.3)then read(10,1,err=20)ii else read(10,1)ii endif 20 write(6,*)irank,ii 1 format(i4) call MPI_FINALIZE(ierr) end N.B., if I deliberately create a format error for the read no zombies remain. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering and imaging to study the structure of matter.