Thanks Peter, that is quite unexpected ...
let s try an other workaround, can you replace integer :: comm_group with integer :: comm_group, comm_tmp and call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr) with call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_tmp, ierr) if (irank < (num_procs/2)) then comm_group = comm_tmp else call MPI_Comm_dup(comm_tmp, comm_group, ierr) endif if it works, I will make a fix tomorrow when I can access my workstation. if not, can you please run mpirun --mca osc_base_verbose 100 ... and post the output ? I will then try to reproduce the issue and investigate it Cheers, Gilles On Tuesday, February 2, 2016, Peter Wind <peter.w...@met.no <javascript:_e(%7B%7D,'cvml','peter.w...@met.no');>> wrote: > Thanks Gilles, > > I get the following output (I guess it is not what you wanted?). > > Peter > > > $ mpirun --mca osc pt2pt -np 4 a.out > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: stallo-2.local > Framework: osc > Component: pt2pt > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_osc_base_open() failed > --> Returned "Not found" (-13) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [stallo-2.local:38415] Local abort before MPI_INIT completed successfully; > not able to aggregate error messages, and not able to guarantee that all > other processes were killed! > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [stallo-2.local:38418] Local abort before MPI_INIT completed successfully; > not able to aggregate error messages, and not able to guarantee that all > other processes were killed! > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [stallo-2.local:38416] Local abort before MPI_INIT completed successfully; > not able to aggregate error messages, and not able to guarantee that all > other processes were killed! > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [stallo-2.local:38417] Local abort before MPI_INIT completed successfully; > not able to aggregate error messages, and not able to guarantee that all > other processes were killed! > ------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun detected that one or more processes exited with non-zero status, > thus causing > the job to be terminated. The first process to do so was: > > Process name: [[52507,1],0] > Exit code: 1 > -------------------------------------------------------------------------- > [stallo-2.local:38410] 3 more processes have sent help message > help-mca-base.txt / find-available:not-valid > [stallo-2.local:38410] Set MCA parameter "orte_base_help_aggregate" to 0 > to see all help / error messages > [stallo-2.local:38410] 2 more processes have sent help message > help-mpi-runtime / mpi_init:startup:internal-failure > > > ------------------------------ > > Peter, > > at first glance, your test program looks correct. > > can you please try to run > mpirun --mca osc pt2pt -np 4 ... > > I might have identified a bug with the sm osc component. > > Cheers, > > Gilles > > On Tuesday, February 2, 2016, Peter Wind <peter.w...@met.no> wrote: > >> Enclosed is a short (< 100 lines) fortran code example that uses shared >> memory. >> It seems to me it behaves wrongly if openmpi is used. >> Compiled with SGI/mpt , it gives the right result. >> >> To fail, the code must be run on a single node. >> It creates two groups of 2 processes each. Within each group memory is >> shared. >> The error is that the two groups get the same memory allocated, but they >> should not. >> >> Tested with openmpi 1.8.4, 1.8.5, 1.10.2 and gfortran, intel 13.0, intel >> 14.0 >> all fail. >> >> The call: >> call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, >> comm_group, cp1, win, ierr) >> >> Should allocate memory only within the group. But when the other group >> allocates memory, the pointers from the two groups point to the same >> address in memory. >> >> Could you please confirm that this is the wrong behaviour? >> >> Best regards, >> Peter Wind > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28429.php > > >