Rick, v2.0.x uses a 60 seconds hard coded timeout (vs 600 seconds in master) in ompi/dpm/dpm.c, see OPAL_PMIX_EXCHANGE
I will check your test and likely have the value bumped to 600 seconds Cheers, Gilles On Tuesday, October 4, 2016, Marlborough, Rick <rmarlboro...@aaccorp.com> wrote: > Gilles; > > The abort occurs somewhere between 30 and 60 seconds. Is > there some configuration setting that could influence this? > > > > Rick > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org > <javascript:_e(%7B%7D,'cvml','users-boun...@lists.open-mpi.org');>] *On > Behalf Of *Gilles Gouaillardet > *Sent:* Tuesday, October 04, 2016 8:39 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] problems with client server scenario using > MPI_Comm_connect > > > > Rick, > > > > How long does it take between the test fails ? > > There were a bug that caused a failure if no connection was received after > 2 (3?) seconds, but I think it was fixed in v2.0.1 > > That being said, you might want to try a nightly snapshot of the v2.0.x > branch > > > > Cheers, > > > > Gilles > > > On Tuesday, October 4, 2016, Marlborough, Rick <rmarlboro...@aaccorp.com > <javascript:_e(%7B%7D,'cvml','rmarlboro...@aaccorp.com');>> wrote: > > Gilles; > > Here is the client side code. The start command is “mpirun > –n 1 client 10” where 10 is used to size a buffer. > > > > int numtasks, rank, dest, source, rc, count, tag=1; > > MPI_Init(&argc,&argv); > > if(argc > 1) > > { > > bufsize = atoi(argv[1]); > > } > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks); > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > MPI_Comm server; > > if(1) > > { > > char port_name[MPI_MAX_PORT_NAME + 1]; > > > > std::ifstream file("./portfile"); > > file.getline(port_name,MPI_MAX_PORT_NAME) > ; > > file.close(); > > //Lookup_name does not work. > > //MPI_Lookup_name("test_service", > MPI_INFO_NULL, port_name); > > std::cout << "Established port name is " > << port_name << std::endl; > > MPI_Comm_connect(port_name, > MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server); > > MPI_Comm_remote_size(server,&num_procs); > > std::cout << "Number of running processes > is " << num_procs << std::endl; > > MPI_Finalize(); > > exit(0); > > } > > > > > > Here is the server code. This is started on a different machine. The > command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer. > > > > int numtasks, rank, dest, source, rc, count, tag=1; > > MPI_Init(&argc,&argv); > > if(argc > 1) > > { > > bufsize = atoi(argv[1]); > > } > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks); > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > MPI_Comm remote_clients; > > MPI_Info pub_global; > > > > std::cout << "This process rank is " << rank << std::endl; > > std::cout << "Number of current processes is " << numtasks > << std::endl; > > char port_name[MPI_MAX_PORT_NAME]; > > mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name); > > MPI_Info_create(&pub_global); > > MPI_Info_set(pub_global, "ompi_global_scope", "true"); > > mpi_error = MPI_Publish_name("test_service", pub_global, > port_name); > > if(mpi_error) > > { > > ... > > } > > std::cout << "Established port name is " << port_name << > std::endl; > > std::ofstream file("./portfile",std::ofstream::trunc); > > file << port_name; > > file.close(); > > MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, > MPI_COMM_WORLD, &remote_clients); > > > > > > > > The server error looks like this… > > > > > > > > The client error look like so… > > > > > > > > Thanks > > Rick > > *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Gilles > Gouaillardet > *Sent:* Tuesday, October 04, 2016 7:13 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] problems with client server scenario using > MPI_Comm_connect > > > > Rick, > > I do not think ompi_server is required here. > Can you please post a trimmed version of your client and server, and your > two mpirun command lines. > You also need to make sure all ranks have the same root parameter when > invoking MPI_Comm_accept and MPI_Comm_connect > > Cheers, > > Gilles > > "Marlborough, Rick" <rmarlboro...@aaccorp.com> wrote: > > Folks; > > I have been trying to get a test case up and running using > a client server scenario with a server waiting on MPI_Comm_accept and the > client trying to connect via MPI_Comm_connect. The port value is written to > a file. The client opens the file and reads the port value. I run the > server, followed by the client. They both appear to sit there for a time, > but eventually they both timeout and abort. They are both running a > separate machines. All other communications between these 2 machines > appears to be OK. Is there some intermediate service that needs to be run? > I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig > network. > > > > Thanks > > Rick > > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users