What version of OMPI are you using? > On Nov 3, 2017, at 7:48 AM, Florian Lindner <mailingli...@xgm.de> wrote: > > Hello, > > I'm working on a sample program to connect two MPI communicators launched > with mpirun using Ports. > > Firstly, I use MPI_Open_port to obtain a name and write that to a file: > > if (options.participant == A) { // A publishes the port > if (options.commType == single and rank == 0) > openPublishPort(options); > > if (options.commType == many) > openPublishPort(options); > } > MPI_Barrier(MPI_COMM_WORLD); > > participant is a command line argument and defines the role of A as server. B > is the client. > > void openPublishPort(Options options) > { > using namespace boost::filesystem; > int rank; > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > char p[MPI_MAX_PORT_NAME]; > MPI_Open_port(MPI_INFO_NULL, p); > std::string portName(p); > > create_directory(options.publishDirectory); > std::string filename; > if (options.commType == many) > filename = "A-" + std::to_string(rank) + ".address"; > if (options.commType == single) > filename = "intercomm.address"; > > auto path = options.publishDirectory / filename; > DEBUG << "Writing address " << portName << " to " << path; > std::ofstream ofs(path.string(), std::ofstream::out); > ofs << portName; > } > > This works fine as far as I see. Next, I try to connect: > > MPI_Comm icomm; > std::string portName; > if (options.participant == A) { // receives connections > if (options.commType == single) { > if (rank == 0) > portName = readPort(options); > INFO << "Accepting connection on " << portName; > MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, > &icomm); > INFO << "Received connection"; > } > } > > if (options.participant == B) { // connects to the intercomms > if (options.commType == single) { > if (rank == 0) > portName = readPort(options); > INFO << "Trying to connect to " << portName; > MPI_Comm_connect(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, > &icomm); > INFO << "Connected"; > } > } > > > options.single says that I want to use a single communicator that contains > all ranks on both participants, A and B. > readPort reads the port name from the file that was written before. > > Now, when I first launch A and, in another terminal, B, nothing happens until > a timeout occurs. > > % mpirun -n 1 ./mpiports --commType="single" --participant="A" > [2017-11-03 15:29:55.469891] [debug] Writing address > 3048013825.0:1069313090 to "./publish/intercomm.address" > [2017-11-03 15:29:55.470169] [debug] Read address 3048013825.0:1069313090 > from "./publish/intercomm.address" > [2017-11-03 15:29:55.470185] [info] Accepting connection on > 3048013825.0:1069313090 > [asaru:16199] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195 > [...] > > and on the other site: > > % mpirun -n 1 ./mpiports --commType="single" --participant="B" > [2017-11-03 15:29:59.698921] [debug] Read address 3048013825.0:1069313090 > from "./publish/intercomm.address" > [2017-11-03 15:29:59.698947] [info] Trying to connect to > 3048013825.0:1069313090 > [asaru:16238] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195 > [...] > > The complete code, including cmake build script can be downloaded at: > > https://www.dropbox.com/s/azo5ti4kjg12zjy/MPI_Ports.tar.gz?dl=0 > > Why is the connection not working? > > Thanks a lot, > Florian > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users