I *believe* that this means that you didn't MPI_File_close a file.

We're not giving a very helpful error message here (it's downright misleading, 
actually), but I'm pretty sure that this is the case.


On Mar 6, 2013, at 10:28 AM, "Smith, Brian E." <smit...@ornl.gov> wrote:

> HI all,
> 
> I have some code that uses parallel netCDF. I've run successfully on Titan 
> (using the Cray MPICH derivative) and on my laptop (also running MPICH). 
> However, when I run on one of our clusters running OMPI, the code barfs in 
> MPI_Finalize() and doesn't write the complete/expected output files:
> 
> [:17472] *** An error occurred in MPI_File_set_errhandler
> [:17472] *** on a NULL communicator
> [:17472] *** Unknown error
> [:17472] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --------------------------------------------------------------------------
> An MPI process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly.  You should
> double check that everything has shut down cleanly.
> 
>  Reason:     After MPI_FINALIZE was invoked
>  Local host:
>  PID:        17472
> --------------------------------------------------------------------------
> 
> The stacks are:
> PMPI_Finalize (pfinalize.c:46)
>       ompi_mpi_finalize (ompi_mpi_finalize.c:272)
>               ompi_file_finalize (file.c:196)
>                       opal_obj_run_destructors (opal_object.h:448)
>                               file_destructor (file.c:273)
>                                       mca_io_romio_file_close 
> (io_romio_file_open.c:59)
>                                               PMPI_File_set_errhandler 
> (pfile_set_errhandler.c:47)
>                                                       
> ompi_mpi_errors_are_fatal_comm_handler (errhandler_predefined.c:52)
> 
> This is with OMPI 1.6.2 It is pnetCDF 1.3.1 on all 3 platforms.
> 
> The code appears to have the right participants opening/closing the right 
> files on the right communicators (a mixture of rank 0s on subcomms opening 
> across their subcomms and some nodes opening on MPI_COMM_SELF). It looks to 
> me like some IO is getting delayed until MPI_Finalize() suggesting perhaps I 
> missed a wait() or close() pnetCDF call. 
> 
> I don't necessarily think this is a bug in OMPI, I just don't know where to 
> start looking in my code, since it is working fine on the two different 
> versions of MPICH.
> 
> Thanks.
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to