Rick,

v2.0.x uses a 60 seconds hard coded timeout (vs 600 seconds in master)
 in ompi/dpm/dpm.c, see OPAL_PMIX_EXCHANGE

I will check your test and likely have the value bumped to 600 seconds

Cheers,

Gilles

On Tuesday, October 4, 2016, Marlborough, Rick <rmarlboro...@aaccorp.com>
wrote:

> Gilles;
>
>                 The abort occurs somewhere between 30 and 60 seconds. Is
> there some configuration setting that could influence this?
>
>
>
> Rick
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org
> <javascript:_e(%7B%7D,'cvml','users-boun...@lists.open-mpi.org');>] *On
> Behalf Of *Gilles Gouaillardet
> *Sent:* Tuesday, October 04, 2016 8:39 AM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] problems with client server scenario using
> MPI_Comm_connect
>
>
>
> Rick,
>
>
>
> How long does it take between the test fails ?
>
> There were a bug that caused a failure if no connection was received after
> 2 (3?) seconds, but I think it was fixed in v2.0.1
>
> That being said, you might want to try a nightly snapshot of the v2.0.x
> branch
>
>
>
> Cheers,
>
>
>
> Gilles
>
>
> On Tuesday, October 4, 2016, Marlborough, Rick <rmarlboro...@aaccorp.com
> <javascript:_e(%7B%7D,'cvml','rmarlboro...@aaccorp.com');>> wrote:
>
> Gilles;
>
>                 Here is the client side code. The start command is “mpirun
> –n 1 client 10” where 10 is used to size a buffer.
>
>
>
>                 int numtasks, rank, dest, source, rc, count, tag=1;
>
>                 MPI_Init(&argc,&argv);
>
>                 if(argc > 1)
>
>                 {
>
>                                 bufsize = atoi(argv[1]);
>
>                 }
>
>                 MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
>
>                 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
>                 MPI_Comm server;
>
>                 if(1)
>
>                 {
>
>                                 char port_name[MPI_MAX_PORT_NAME + 1];
>
>
>
>                                 std::ifstream file("./portfile");
>
>                                 file.getline(port_name,MPI_MAX_PORT_NAME)
> ;
>
>                                 file.close();
>
>                                 //Lookup_name does not work.
>
>                                 //MPI_Lookup_name("test_service",
> MPI_INFO_NULL, port_name);
>
>                                 std::cout << "Established port name is "
> << port_name << std::endl;
>
>                                 MPI_Comm_connect(port_name,
> MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server);
>
>                                 MPI_Comm_remote_size(server,&num_procs);
>
>                                 std::cout << "Number of running processes
> is " << num_procs << std::endl;
>
>                                 MPI_Finalize();
>
>                                 exit(0);
>
>                 }
>
>
>
>
>
> Here is the server code. This is started on a different machine. The
> command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer.
>
>
>
> int numtasks, rank, dest, source, rc, count, tag=1;
>
>                 MPI_Init(&argc,&argv);
>
>                 if(argc > 1)
>
>                 {
>
>                                 bufsize = atoi(argv[1]);
>
>                 }
>
>                 MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
>
>                 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
>
>
>
>
>                 MPI_Comm remote_clients;
>
>                 MPI_Info pub_global;
>
>
>
>                 std::cout << "This process rank is " << rank << std::endl;
>
>                 std::cout << "Number of current processes is " << numtasks
> << std::endl;
>
>                 char port_name[MPI_MAX_PORT_NAME];
>
>                 mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name);
>
>                 MPI_Info_create(&pub_global);
>
>                 MPI_Info_set(pub_global, "ompi_global_scope", "true");
>
>                 mpi_error = MPI_Publish_name("test_service", pub_global,
> port_name);
>
>                 if(mpi_error)
>
>                 {
>
>                                 ...
>
>                 }
>
>                 std::cout << "Established port name is " << port_name <<
> std::endl;
>
>                 std::ofstream file("./portfile",std::ofstream::trunc);
>
>                 file << port_name;
>
>                 file.close();
>
>                 MPI_Comm_accept(port_name, MPI_INFO_NULL, 0,
> MPI_COMM_WORLD, &remote_clients);
>
>
>
>
>
>
>
> The server error looks like this…
>
>
>
>
>
>
>
> The client error look like so…
>
>
>
>
>
>
>
> Thanks
>
> Rick
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Gilles
> Gouaillardet
> *Sent:* Tuesday, October 04, 2016 7:13 AM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] problems with client server scenario using
> MPI_Comm_connect
>
>
>
> Rick,
>
> I do not think ompi_server is required here.
> Can you please post a trimmed version of your client and server, and your
> two mpirun command lines.
> You also need to make sure all ranks have the same root parameter when
> invoking MPI_Comm_accept and MPI_Comm_connect
>
> Cheers,
>
> Gilles
>
> "Marlborough, Rick" <rmarlboro...@aaccorp.com> wrote:
>
> Folks;
>
>                 I have been trying to get a test case up and running using
> a client server scenario with a server waiting on MPI_Comm_accept and the
> client trying to connect via MPI_Comm_connect. The port value is written to
> a file. The client opens the file and reads the port value. I run the
> server, followed by the client. They both appear to sit there for a time,
> but eventually they both timeout and abort. They are both running a
> separate machines. All other communications between these 2 machines
> appears to be OK. Is there some intermediate service that needs to be run?
> I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig
> network.
>
>
>
> Thanks
>
> Rick
>
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to