I have attached a small program that when run on my machine produces the error message below and locks up.
[node0000:06319] [mpool_gm_module.c:100] error(8) registering gm memoryI get the error when I run with 32 processors, but not with 4 (even if I increase the loop count to 20000). This is on a cluster of dual-dual core opterons with myrinet switches (i.e. using the gm routines). Unfortunately, I don't have the configure options that were used to build openmpi, but I don't think there was anything unusual. I've also attached the open_info output. Here is the compile line for the code
g95 -o allreducetest allreducetest.F -I/usr/local/ompi/1.1-gcc/include -L/usr/local/ompi/1.1-gcc/lib -lmpi
Also note that I did have to make changes to the fortran include files in openmpi to force all of the integers to be of size 4 (i.e. declaring them integer(4)) since the default integer size used by g95 is 8 bytes but the openmpi fortran interface was compiled with f77 which uses 4 byte integers.
Any suggestions on what to look for? Thanks for the help, Dave
program parallel_sum_mmnts real(kind=8):: zmmnts(0:360,28,0:8) c Use reduction routines to sum whole beam moments across all c of the processors. It also shares z moment data at PE boundaries. c --- temporary for z moments real(kind=8),allocatable:: ztemp(:,:,:) integer(4):: nn,nslaves,my_index,ii include "mpif.h" integer(4):: mpierror call MPI_INIT(mpierror) call MPI_COMM_SIZE(MPI_COMM_WORLD,nslaves,mpierror) call MPI_COMM_RANK(MPI_COMM_WORLD,my_index,mpierror) do ii=1,20000 print*,"PSM1 ",ii,my_index zmmnts0 = my_index zmmnts = my_index allocate(ztemp(0:360,28,0:8)) c --- Do reduction on beam z moments. ztemp = zmmnts nn = (1+360)*28*(1+8) print*,"PSM1 ",my_index,nn call MPI_ALLREDUCE(ztemp,zmmnts,nn, & MPI_DOUBLE_PRECISION,MPI_SUM,MPI_COMM_WORLD,mpierror) print*,"PSM2 ",my_index deallocate(ztemp) enddo stop end
oinfo.gz
Description: GNU Zip compressed data