Oh sure - just not shared memory

> On Jun 15, 2016, at 8:29 AM, Louis Williams <louis.willi...@gatech.edu> wrote:
> 
> Ralph, thanks for the quick reply. Is cross-job fast transport like 
> InfiniBand supported? 
> 
> Louis 
> 
> On Tue, Jun 14, 2016 at 3:53 PM Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> Nope - we don’t currently support cross-job shared memory operations. Nathan 
> has talked about doing so for vader, but not at this time.
> 
> 
> 
>> On Jun 14, 2016, at 12:38 PM, Louis Williams <louis.willi...@gatech.edu 
>> <mailto:louis.willi...@gatech.edu>> wrote:
>> 
> 
>> Hi,
>> 
>> I am attempting to use the sm and vader BTLs between a client and server 
>> process, but it seems impossible to use fast transports (i.e. not TCP) 
>> between two independent groups started with two separate mpirun invocations. 
>> Am I correct, or is there a way to communicate using shared memory between a 
>> client and server like this? It seems this might be the case: 
>> https://github.com/open-mpi/ompi/blob/master/ompi/dpm/dpm.c#L495 
>> <https://github.com/open-mpi/ompi/blob/master/ompi/dpm/dpm.c#L495>
>> 
>> The server calls MPI::COMM_WORLD.Accept() and the client calls 
>> MPI::COMM_WORLD.Connect(). Each program is started with "mpirun --np 1 --mca 
>> btl self,sm,vader <exectuable>" where the executable is either the client or 
>> server program. When no BTL is specified, both establish a TCP connection 
>> just fine. But when the sm and vader BTLs are specified, immediately after 
>> the Connect() call, both client and server exit with the message, copied at 
>> the end. It seems as though intergroup communication can't use fast 
>> transport and only uses TCP. 
>> 
>> Also, as expected, running the Accept() and Connect() calls within a single 
>> program with "mpirun -np 2 --mca btl self,sm,vader ..." uses shared memory 
>> as transport.
>> 
>> $> mpirun --ompi-server "3414491136.0;tcp://10.4.131.16:49775 
>> <http://10.4.131.16:49775/>" -np 1 --mca btl self,vader ./server
>> 
>> At least one pair of MPI processes are unable to reach each other for
>> MPI communications.  This means that no Open MPI device has indicated
>> that it can be used to communicate between these processes.  This is
>> an error; Open MPI requires that all MPI processes be able to reach
>> each other.  This error can sometimes be the result of forgetting to
>> specify the "self" BTL.
>> 
>>   Process 1 ([[50012,1],0]) is on host: MacBook-Pro-80
>>   Process 2 ([[50010,1],0]) is on host: MacBook-Pro-80
>>   BTLs attempted: self
>> 
>> Your MPI job is now going to abort; sorry.
>> --------------------------------------------------------------------------
>> [MacBook-Pro-80.local:57315] [[50012,1],0] ORTE_ERROR_LOG: Unreachable in 
>> file dpm_orte.c at line 523
>> [MacBook-Pro-80:57315] *** An error occurred in MPI_Comm_accept
>> [MacBook-Pro-80:57315] *** reported by process [7572553729,4294967296]
>> [MacBook-Pro-80:57315] *** on communicator MPI_COMM_WORLD
>> [MacBook-Pro-80:57315] *** MPI_ERR_INTERN: internal error
>> [MacBook-Pro-80:57315] *** MPI_ERRORS_ARE_FATAL (processes in this 
>> communicator will now abort,
>> [MacBook-Pro-80:57315] ***    and potentially your MPI job)
>> -------------------------------------------------------
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> -------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun detected that one or more processes exited with non-zero status, thus 
>> causing
>> the job to be terminated. The first process to do so was:
>> 
>>   Process name: [[50012,1],0]
>>   Exit code:    17
>> -------------------------------------------------------------------------- 
>> 
>> Thanks,
>> Louis
> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/06/29441.php 
>> <http://www.open-mpi.org/community/lists/users/2016/06/29441.php>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29442.php 
> <http://www.open-mpi.org/community/lists/users/2016/06/29442.php>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29456.php

Reply via email to