what is the communicator that you used to open the file? I am wondering whether it differs from the communicator used in MPI_Barrier, and some processes do not enter the Barrier at all...
Thanks Edgar On 5/10/2012 12:22 PM, Ricardo Reis wrote: > > Hi all > > I'm trying to run my code in a cluster here with infiniband. It is in > Fortran 95/2003 and uses MPI-IO for output. I'm using openmpi 1.5.5. It > runs has been running fine but for a particular configuration, using all > of the cluster cores (128, divided in 4 boxes with 4 Octo-core Opterons > each), it hangs while calling MPI-IO. > > So what I am asking is help in debugging this. This is the relevant > part of the code > > CALL MPI_File_set_view(fh, disp, etype, filetype, & > TRIM(datarep), MPI_INFO_NULL, ierr) > > IF(DEBGON)THEN > IF(master)THEN > WRITE(logfl,'(/,"DBG: WriteMPI_IO going to write file.")') > FLUSH(logfl) > ENDIF > CALL MPI_Barrier(world, ierr) > ENDIF > > CALL MPI_file_write_at_all(fh, offset, arr, dim, & > etype, status, ierr) > > > > And it hangs just after the flush, so apparently in the > MPI_write_at_all call. > > Any ideas of what to do or where to look are welcomed. > > best, > > > Ricardo Reis > > 'Non Serviam' > > PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering > > Computational Fluid Dynamics, High Performance Computing, Turbulence > http://www.lasef.ist.utl.pt > > Cultural Instigator @ Rádio Zero > http://www.radiozero.pt > > http://www.flickr.com/photos/rreis/ > > contacts: gtalk: kyriu...@gmail.com skype: kyriusan > > Institutional Address: > > Ricardo J.N. dos Reis > IDMEC, Instituto Superior Técnico, Technical University of Lisbon > Av. Rovisco Pais > 1049-001 Lisboa > Portugal > > - email sent with alpine 2.00 - > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
signature.asc
Description: OpenPGP digital signature