Oh sure - just not shared memory
> On Jun 15, 2016, at 8:29 AM, Louis Williams <louis.willi...@gatech.edu> wrote:
>
> Ralph, thanks for the quick reply. Is cross-job fast transport like
> InfiniBand supported?
>
> Louis
>
> On Tue, Jun 14, 2016 at 3:53 PM Ralph Castain <r...@open-mpi.org
> <mailto:r...@open-mpi.org>> wrote:
> Nope - we don’t currently support cross-job shared memory operations. Nathan
> has talked about doing so for vader, but not at this time.
>
>
>
>> On Jun 14, 2016, at 12:38 PM, Louis Williams <louis.willi...@gatech.edu
>> <mailto:louis.willi...@gatech.edu>> wrote:
>>
>
>> Hi,
>>
>> I am attempting to use the sm and vader BTLs between a client and server
>> process, but it seems impossible to use fast transports (i.e. not TCP)
>> between two independent groups started with two separate mpirun invocations.
>> Am I correct, or is there a way to communicate using shared memory between a
>> client and server like this? It seems this might be the case:
>> https://github.com/open-mpi/ompi/blob/master/ompi/dpm/dpm.c#L495
>> <https://github.com/open-mpi/ompi/blob/master/ompi/dpm/dpm.c#L495>
>>
>> The server calls MPI::COMM_WORLD.Accept() and the client calls
>> MPI::COMM_WORLD.Connect(). Each program is started with "mpirun --np 1 --mca
>> btl self,sm,vader <exectuable>" where the executable is either the client or
>> server program. When no BTL is specified, both establish a TCP connection
>> just fine. But when the sm and vader BTLs are specified, immediately after
>> the Connect() call, both client and server exit with the message, copied at
>> the end. It seems as though intergroup communication can't use fast
>> transport and only uses TCP.
>>
>> Also, as expected, running the Accept() and Connect() calls within a single
>> program with "mpirun -np 2 --mca btl self,sm,vader ..." uses shared memory
>> as transport.
>>
>> $> mpirun --ompi-server "3414491136.0;tcp://10.4.131.16:49775
>> <http://10.4.131.16:49775/>" -np 1 --mca btl self,vader ./server
>>
>> At least one pair of MPI processes are unable to reach each other for
>> MPI communications. This means that no Open MPI device has indicated
>> that it can be used to communicate between these processes. This is
>> an error; Open MPI requires that all MPI processes be able to reach
>> each other. This error can sometimes be the result of forgetting to
>> specify the "self" BTL.
>>
>> Process 1 ([[50012,1],0]) is on host: MacBook-Pro-80
>> Process 2 ([[50010,1],0]) is on host: MacBook-Pro-80
>> BTLs attempted: self
>>
>> Your MPI job is now going to abort; sorry.
>> --------------------------------------------------------------------------
>> [MacBook-Pro-80.local:57315] [[50012,1],0] ORTE_ERROR_LOG: Unreachable in
>> file dpm_orte.c at line 523
>> [MacBook-Pro-80:57315] *** An error occurred in MPI_Comm_accept
>> [MacBook-Pro-80:57315] *** reported by process [7572553729,4294967296]
>> [MacBook-Pro-80:57315] *** on communicator MPI_COMM_WORLD
>> [MacBook-Pro-80:57315] *** MPI_ERR_INTERN: internal error
>> [MacBook-Pro-80:57315] *** MPI_ERRORS_ARE_FATAL (processes in this
>> communicator will now abort,
>> [MacBook-Pro-80:57315] *** and potentially your MPI job)
>> -------------------------------------------------------
>> Primary job terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> -------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun detected that one or more processes exited with non-zero status, thus
>> causing
>> the job to be terminated. The first process to do so was:
>>
>> Process name: [[50012,1],0]
>> Exit code: 17
>> --------------------------------------------------------------------------
>>
>> Thanks,
>> Louis
>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/06/29441.php
>> <http://www.open-mpi.org/community/lists/users/2016/06/29441.php>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/06/29442.php
> <http://www.open-mpi.org/community/lists/users/2016/06/29442.php>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/06/29456.php