Re: [OMPI users] RMA in openmpi

2020-04-28 Thread Claire Cashmore via users
Hi Joseph

OK, that makes sense. Thank you for your help!

Thanks again

Claire

On 27/04/2020, 11:28, "Joseph Schuchart"  wrote:

Hi Claire,

You cannot use MPI_Get (or any other RMA communication routine) on a 
window for which no access epoch has been started. MPI_Win_fence starts 
an active target access epoch, MPI_Win_lock[_all] start a passive target 
access epoch. Window locks are synchronizing in the sense that they 
provide a means for mutual exclusion if an exclusive lock is involved (a 
process holding a shared window lock allows for other processes to 
acquire shared locks but prevents them from taking an exclusive lock, 
and vice versa).

One common strategy is to call MPI_Win_lock_all on all processes to let 
all processes acquire a shared lock, which they hold until the end of 
the application run. Communication is then done using a combination of 
MPI_Get/MPI_Put/accumulate functions and flushes. As said earlier, you 
likely will need to take care of synchronization among the processes if 
they also modify data in the window.

Cheers
Joseph

On 4/27/20 12:14 PM, Claire Cashmore wrote:
> Hi Joseph
> 
> Thank you for your reply. From what I had been reading I thought they 
were both called "synchronization calls" just that one was passive (lock) and 
one was active (fence), sorry if I've got confused!
> So I'm asking do need either MPI_Win_fence or MPI_Win_unlock/lock in 
order to use one-sided calls, and is it not possible to use one-sided 
communication without them? So just a stand alone MPI_Get, without the other 
calls before and after? It seems not from what you are saying, but I just 
wanted to confirm.
> 
> Thanks again
> 
> Claire
> 
> On 27/04/2020, 07:50, "Joseph Schuchart via users" 
 wrote:
> 
>  Claire,
> 
>   > Is it possible to use the one-sided communication without 
combining
>  it with synchronization calls?
> 
>  What exactly do you mean by "synchronization calls"? MPI_Win_fence is
>  indeed synchronizing (basically flush+barrier) but MPI_Win_lock (and 
the
>  passive target synchronization interface at large) is not. It does 
incur
>  some overhead because the lock has to be taken somehow at some point.
>  However, it does not require a matching call at the target to 
complete.
> 
>  You can lock a window using a (shared or exclusive) lock, initiate 
RMA
>  operations, flush them to wait for their completion, and initiate the
>  next set of RMA operations to flush later. None of these calls are
>  synchronizing. You will have to perform your own synchronization at 
some
>  point though to make sure processes read consistent data.
> 
>  HTH!
>  Joseph
> 
> 
>  On 4/24/20 5:34 PM, Claire Cashmore via users wrote:
>  > Hello
>  >
>  > I was wondering if someone could help me with a question.
>  >
>  > When using RMA is there a requirement to use some type of
>  > synchronization? When using one-sided communication such as 
MPI_Get the
>  > code will only run when I combine it with MPI_Win_fence or
>  > MPI_Win_lock/unlock. I do not want to use MPI_Win_fence as I’m 
using the
>  > one-sided communication to allow some communication when processes 
are
>  > not synchronised, so this defeats the point. I could use
>  > MPI_Win_lock/unlock, however someone I’ve spoken to has said that I
>  > should be able to use RMA without any synchronization calls, if so 
then
>  > I would prefer to do this to reduce any overheads using 
MPI_Win_lock
>  > every time I use the one-sided communication may produce.
>  >
>  > Is it possible to use the one-sided communication without 
combining it
>  > with synchronization calls?
>  >
>  > (It doesn’t seem to matter what version of openmpi I use).
>  >
>  > Thank you
>  >
>  > Claire
>  >
> 



Re: [OMPI users] Handle Ctrl+C in subprocesses

2020-04-28 Thread George Reeke via users
On Mon, 2020-04-27 at 11:48 +0100, Jérémie Wenger via users wrote:
> Hi,
> 
> I recently installed open mpi (4.0.3) using the procedure described
> here, as I'm trying to use Horovod for multiple gpu acceleration.
> 
> I am looking for a way to handle a keyboard interrupt (save a deep
> learning model before shutting everything down). I posted a question
> here.
> 
I have used SIGUSR1 and write a signal handler in the rank 0
program to do whatever is needed to save data and shutdown cleanly
(using standard MPI messages on an alternative communication
channel that is initialized for just this purpose.  Other ranks
test for messages on this channel at suitable points where they
can stop gracefully.).
Then you need to use kill to send the signal instead of CTRL/C.
But I have a note in my code that I never implemented, that in
case running on a remote server, some sort of socket protocol is
needed to initiate the shutdown instead of a signal.
George Reeke




Re: [OMPI users] Handle Ctrl+C in subprocesses

2020-04-28 Thread Jérémie Wenger via users
Dear George,

Many thanks for your swift response, much appreciated ! This way of doing
it makes sense.

Best regards,
Jeremie


[OMPI users] All nodes which are allocated for this job are already filled.

2020-04-28 Thread carlos aguni via users
Hi all,

I'm trying to MPI_Spawn processes with no success.
I'm facing the following error:
=
All nodes which are allocated for this job are already filled.
==

I'm setting the hostname as follows:
MPI_Info_set(minfo, "host", hostname);

I'm already running with `--oversubscribe` flag and I've already tried
these hostfile:

controller max-slots=1
client max-slots=3
gatewaymax-slots=3
server1max-slots=41
server2max-slots=41
server3max-slots=41
server4max-slots=41

controller slots=1
client slots=3
gatewayslots=3
server1slots=41
server2slots=41
server3slots=41
server4slots=41

 Can anyone help me? Is there a way to force/bypass/disable it?
 I'm running openmpi3/3.1.4 gnu8/8.3.0 from openhpc.

Regards,
Carlos.