Thanks Peter,
this is just a workaround for a bug we just identified, the fix will
come soon
Cheers,
Gilles
On Tuesday, February 2, 2016, Peter Wind <peter.w...@met.no
<mailto:peter.w...@met.no>> wrote:
That worked!
i.e with the changed you proposed the code gives the right result.
That was efficient work, thank you Gilles :)
Best wishes,
Peter
------------------------------------------------------------------------
Thanks Peter,
that is quite unexpected ...
let s try an other workaround, can you replace
integer :: comm_group
with
integer :: comm_group, comm_tmp
and
call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr)
with
call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_tmp, ierr)
if (irank < (num_procs/2)) then
comm_group = comm_tmp
else
call MPI_Comm_dup(comm_tmp, comm_group, ierr)
endif
if it works, I will make a fix tomorrow when I can access my
workstation.
if not, can you please run
mpirun --mca osc_base_verbose 100 ...
and post the output ?
I will then try to reproduce the issue and investigate it
Cheers,
Gilles
On Tuesday, February 2, 2016, Peter Wind <peter.w...@met.no>
wrote:
Thanks Gilles,
I get the following output (I guess it is not what you
wanted?).
Peter
$ mpirun --mca osc pt2pt -np 4 a.out
--------------------------------------------------------------------------
A requested component was not found, or was unable to be
opened. This
means that this component is either not installed or is
unable to be
used on your system (e.g., sometimes this means that
shared libraries
that the component requires are unable to be
found/loaded). Note that
Open MPI stopped checking at the first component that it
did not find.
Host: stallo-2.local
Framework: osc
Component: pt2pt
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your
parallel process is
likely to abort. There are many reasons that a parallel
process can
fail during MPI_INIT; some of which are due to
configuration or environment
problems. This failure appears to be an internal failure;
here's some
additional information (which may only be relevant to an
Open MPI
developer):
ompi_osc_base_open() failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
*** and potentially your MPI job)
[stallo-2.local:38415] Local abort before MPI_INIT
completed successfully; not able to aggregate error
messages, and not able to guarantee that all other
processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
*** and potentially your MPI job)
[stallo-2.local:38418] Local abort before MPI_INIT
completed successfully; not able to aggregate error
messages, and not able to guarantee that all other
processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
*** and potentially your MPI job)
[stallo-2.local:38416] Local abort before MPI_INIT
completed successfully; not able to aggregate error
messages, and not able to guarantee that all other
processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
*** and potentially your MPI job)
[stallo-2.local:38417] Local abort before MPI_INIT
completed successfully; not able to aggregate error
messages, and not able to guarantee that all other
processes were killed!
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has
been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with
non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[52507,1],0]
Exit code: 1
--------------------------------------------------------------------------
[stallo-2.local:38410] 3 more processes have sent help
message help-mca-base.txt / find-available:not-valid
[stallo-2.local:38410] Set MCA parameter
"orte_base_help_aggregate" to 0 to see all help / error
messages
[stallo-2.local:38410] 2 more processes have sent help
message help-mpi-runtime / mpi_init:startup:internal-failure
------------------------------------------------------------------------
Peter,
at first glance, your test program looks correct.
can you please try to run
mpirun --mca osc pt2pt -np 4 ...
I might have identified a bug with the sm osc component.
Cheers,
Gilles
On Tuesday, February 2, 2016, Peter Wind
<peter.w...@met.no> wrote:
Enclosed is a short (< 100 lines) fortran code
example that uses shared memory.
It seems to me it behaves wrongly if openmpi is used.
Compiled with SGI/mpt , it gives the right result.
To fail, the code must be run on a single node.
It creates two groups of 2 processes each. Within
each group memory is shared.
The error is that the two groups get the same
memory allocated, but they should not.
Tested with openmpi 1.8.4, 1.8.5, 1.10.2 and
gfortran, intel 13.0, intel 14.0
all fail.
The call:
call MPI_Win_allocate_shared(win_size,
disp_unit, MPI_INFO_NULL, comm_group, cp1, win, ierr)
Should allocate memory only within the group. But
when the other group allocates memory, the
pointers from the two groups point to the same
address in memory.
Could you please confirm that this is the wrong
behaviour?
Best regards,
Peter Wind
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/02/28429.php
_______________________________________________
users mailing list
us...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/02/28431.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/02/28436.php