I posted this question on StackOverflow and someone suggested I write to the 
OpenMPI community.

https://stackoverflow.com/questions/62223698/mpi-i-o-why-does-my-program-hang-or-misbehave-when-one-process-writes-using-mpi

Below is a little MPI program.  It is a simple use of MPI I/O.   Process 0 
writes an int to the file using MPI_File_write_shared; no other process writes 
anything.   It works correctly using an MPICH installation, but on two 
different machines using OpenMPI, it either hangs in the middle of the call to 
MPI_File_write_shared, or it reports an error at the end.  Not sure if it is my 
misunderstanding of the MPI Standard or a bug or configuration problem with my 
OpenMPI.

Thanks in advance if anyone can look at it,
Steve


#include <stdio.h>
#include <mpi.h>
#include <assert.h>

int nprocs, rank;

int main() {
  MPI_File fh;
  int err, count;
  MPI_Status status;

  MPI_Init(NULL, NULL);
  MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  err = MPI_File_open(MPI_COMM_WORLD, "io_byte_shared.tmp",
                      MPI_MODE_CREATE | MPI_MODE_WRONLY,
                      MPI_INFO_NULL, &fh);
  assert(err==0);
  err = MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, "native", MPI_INFO_NULL);
  assert(err==0);
  printf("Proc %d: file has been opened.\n", rank); fflush(stdout);
  // Proc 0 only writes header using shared file pointer...
  MPI_Barrier(MPI_COMM_WORLD);
  if (rank == 0) {
    int x = 9999;
    printf("Proc 0: About to write to file.\n"); fflush(stdout);
    err = MPI_File_write_shared(fh, &x, 1, MPI_INT, &status);
    printf("Proc 0: Finished writing.\n"); fflush(stdout);
    assert(err == 0);
  }
  MPI_Barrier(MPI_COMM_WORLD);
  printf("Proc %d: about to close file.\n", rank); fflush(stdout);
  err = MPI_File_close(&fh);
  assert(err==0);
  MPI_Finalize();
}

Example run:

$ mpicc io_byte_shared.c
$ mpiexec -n 4 ./a.out
Proc 0: file has been opened.
Proc 0: About to write to file.
Proc 0: Finished writing.
Proc 1: file has been opened.
Proc 2: file has been opened.
Proc 3: file has been opened.
Proc 0: about to close file.
Proc 1: about to close file.
Proc 2: about to close file.
Proc 3: about to close file.
[ilyich:12946] 3 more processes have sent help message help-mpi-btl-base.txt / 
btl:no-nics
[ilyich:12946] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages


Reply via email to