Re: [OMPI users] problems with client server scenario using MPI_Comm_connect
Rick, I do not think ompi_server is required here. Can you please post a trimmed version of your client and server, and your two mpirun command lines. You also need to make sure all ranks have the same root parameter when invoking MPI_Comm_accept and MPI_Comm_connect Cheers, Gilles "Marlborough, Rick" wrote: > > >Folks; > > I have been trying to get a test case up and running using a >client server scenario with a server waiting on MPI_Comm_accept and the client >trying to connect via MPI_Comm_connect. The port value is written to a file. >The client opens the file and reads the port value. I run the server, followed >by the client. They both appear to sit there for a time, but eventually they >both timeout and abort. They are both running a separate machines. All other >communications between these 2 machines appears to be OK. Is there some >intermediate service that needs to be run? I am using OpenMPI v2.01 on Red Hat >linux v6.5 64 bit running on a 1 gig network. > > > >Thanks > >Rick > > > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OS X + Xcode 8 : dyld: Symbol not found: _clock_gettime
Christophe, If i read between the lines, you had Open MPI running just fine, then you upgraded xcode and that broke OpenMPI. Am i right so far ? Did you build Open MPI by yourself, or did you get binaries from somewhere (such as brew) ? In the first case, you need to rebuild Open MPI. (You have to run configure && make install, so Open MPI is aware clock_gettime is not available) I will check how brew binaries were built, and ensure they do not use clock_gettime. Cheers, Gilles Christophe Peyret wrote: >Hello, > > >since Xcode8 update, I have problem problem with mpirun function : >_clock_gettime is not found > > > >dyld: Symbol not found: _clock_gettime > > Referenced from: /opt/openmpi-1.10.4/lib//libopen-pal.13.dylib > > Expected in: flat namespace > > >Trace/BPT trap: 5 > > >Any idea ? > > >Christophe > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] problems with client server scenario using MPI_Comm_connect
Gilles; Here is the client side code. The start command is “mpirun –n 1 client 10” where 10 is used to size a buffer. int numtasks, rank, dest, source, rc, count, tag=1; MPI_Init(&argc,&argv); if(argc > 1) { bufsize = atoi(argv[1]); } MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm server; if(1) { char port_name[MPI_MAX_PORT_NAME + 1]; std::ifstream file("./portfile"); file.getline(port_name,MPI_MAX_PORT_NAME) ; file.close(); //Lookup_name does not work. //MPI_Lookup_name("test_service", MPI_INFO_NULL, port_name); std::cout << "Established port name is " << port_name << std::endl; MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server); MPI_Comm_remote_size(server,&num_procs); std::cout << "Number of running processes is " << num_procs << std::endl; MPI_Finalize(); exit(0); } Here is the server code. This is started on a different machine. The command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer. int numtasks, rank, dest, source, rc, count, tag=1; MPI_Init(&argc,&argv); if(argc > 1) { bufsize = atoi(argv[1]); } MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm remote_clients; MPI_Info pub_global; std::cout << "This process rank is " << rank << std::endl; std::cout << "Number of current processes is " << numtasks << std::endl; char port_name[MPI_MAX_PORT_NAME]; mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name); MPI_Info_create(&pub_global); MPI_Info_set(pub_global, "ompi_global_scope", "true"); mpi_error = MPI_Publish_name("test_service", pub_global, port_name); if(mpi_error) { ... } std::cout << "Established port name is " << port_name << std::endl; std::ofstream file("./portfile",std::ofstream::trunc); file << port_name; file.close(); MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &remote_clients); The server error looks like this… [cid:image001.png@01D21E19.521F3BC0] The client error look like so… [cid:image002.png@01D21E19.521F3BC0] Thanks Rick From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Tuesday, October 04, 2016 7:13 AM To: Open MPI Users Subject: Re: [OMPI users] problems with client server scenario using MPI_Comm_connect Rick, I do not think ompi_server is required here. Can you please post a trimmed version of your client and server, and your two mpirun command lines. You also need to make sure all ranks have the same root parameter when invoking MPI_Comm_accept and MPI_Comm_connect Cheers, Gilles "Marlborough, Rick" mailto:rmarlboro...@aaccorp.com>> wrote: Folks; I have been trying to get a test case up and running using a client server scenario with a server waiting on MPI_Comm_accept and the client trying to connect via MPI_Comm_connect. The port value is written to a file. The client opens the file and reads the port value. I run the server, followed by the client. They both appear to sit there for a time, but eventually they both timeout and abort. They are both running a separate machines. All other communications between these 2 machines appears to be OK. Is there some intermediate service that needs to be run? I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig network. Thanks Rick ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] problems with client server scenario using MPI_Comm_connect
Rick, How long does it take between the test fails ? There were a bug that caused a failure if no connection was received after 2 (3?) seconds, but I think it was fixed in v2.0.1 That being said, you might want to try a nightly snapshot of the v2.0.x branch Cheers, Gilles On Tuesday, October 4, 2016, Marlborough, Rick wrote: > Gilles; > > Here is the client side code. The start command is “mpirun > –n 1 client 10” where 10 is used to size a buffer. > > > > int numtasks, rank, dest, source, rc, count, tag=1; > > MPI_Init(&argc,&argv); > > if(argc > 1) > > { > > bufsize = atoi(argv[1]); > > } > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks); > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > MPI_Comm server; > > if(1) > > { > > char port_name[MPI_MAX_PORT_NAME + 1]; > > > > std::ifstream file("./portfile"); > > file.getline(port_name,MPI_MAX_PORT_NAME) > ; > > file.close(); > > //Lookup_name does not work. > > //MPI_Lookup_name("test_service", > MPI_INFO_NULL, port_name); > > std::cout << "Established port name is " > << port_name << std::endl; > > MPI_Comm_connect(port_name, > MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server); > > MPI_Comm_remote_size(server,&num_procs); > > std::cout << "Number of running processes > is " << num_procs << std::endl; > > MPI_Finalize(); > > exit(0); > > } > > > > > > Here is the server code. This is started on a different machine. The > command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer. > > > > int numtasks, rank, dest, source, rc, count, tag=1; > > MPI_Init(&argc,&argv); > > if(argc > 1) > > { > > bufsize = atoi(argv[1]); > > } > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks); > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > MPI_Comm remote_clients; > > MPI_Info pub_global; > > > > std::cout << "This process rank is " << rank << std::endl; > > std::cout << "Number of current processes is " << numtasks > << std::endl; > > char port_name[MPI_MAX_PORT_NAME]; > > mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name); > > MPI_Info_create(&pub_global); > > MPI_Info_set(pub_global, "ompi_global_scope", "true"); > > mpi_error = MPI_Publish_name("test_service", pub_global, > port_name); > > if(mpi_error) > > { > > ... > > } > > std::cout << "Established port name is " << port_name << > std::endl; > > std::ofstream file("./portfile",std::ofstream::trunc); > > file << port_name; > > file.close(); > > MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, > MPI_COMM_WORLD, &remote_clients); > > > > > > > > The server error looks like this… > > > > > > > > The client error look like so… > > > > > > > > Thanks > > Rick > > *From:* users [mailto:users-boun...@lists.open-mpi.org > ] *On > Behalf Of *Gilles Gouaillardet > *Sent:* Tuesday, October 04, 2016 7:13 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] problems with client server scenario using > MPI_Comm_connect > > > > Rick, > > I do not think ompi_server is required here. > Can you please post a trimmed version of your client and server, and your > two mpirun command lines. > You also need to make sure all ranks have the same root parameter when > invoking MPI_Comm_accept and MPI_Comm_connect > > Cheers, > > Gilles > > "Marlborough, Rick" > wrote: > > Folks; > > I have been trying to get a test case up and running using > a client server scenario with a server waiting on MPI_Comm_accept and the > client trying to connect via MPI_Comm_connect. The port value is written to > a file. The client opens the file and reads the port value. I run the > server, followed by the client. They both appear to sit there for a time, > but eventually they both timeout and abort. They are both running a > separate machines. All other communications between these 2 machines > appears to be OK. Is there some intermediate service that needs to be run? > I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig > network. > > > > Thanks > > Rick > > > _
Re: [OMPI users] problems with client server scenario using MPI_Comm_connect
Gilles; The abort occurs somewhere between 30 and 60 seconds. Is there some configuration setting that could influence this? Rick From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Tuesday, October 04, 2016 8:39 AM To: Open MPI Users Subject: Re: [OMPI users] problems with client server scenario using MPI_Comm_connect Rick, How long does it take between the test fails ? There were a bug that caused a failure if no connection was received after 2 (3?) seconds, but I think it was fixed in v2.0.1 That being said, you might want to try a nightly snapshot of the v2.0.x branch Cheers, Gilles On Tuesday, October 4, 2016, Marlborough, Rick mailto:rmarlboro...@aaccorp.com>> wrote: Gilles; Here is the client side code. The start command is “mpirun –n 1 client 10” where 10 is used to size a buffer. int numtasks, rank, dest, source, rc, count, tag=1; MPI_Init(&argc,&argv); if(argc > 1) { bufsize = atoi(argv[1]); } MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm server; if(1) { char port_name[MPI_MAX_PORT_NAME + 1]; std::ifstream file("./portfile"); file.getline(port_name,MPI_MAX_PORT_NAME) ; file.close(); //Lookup_name does not work. //MPI_Lookup_name("test_service", MPI_INFO_NULL, port_name); std::cout << "Established port name is " << port_name << std::endl; MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server); MPI_Comm_remote_size(server,&num_procs); std::cout << "Number of running processes is " << num_procs << std::endl; MPI_Finalize(); exit(0); } Here is the server code. This is started on a different machine. The command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer. int numtasks, rank, dest, source, rc, count, tag=1; MPI_Init(&argc,&argv); if(argc > 1) { bufsize = atoi(argv[1]); } MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm remote_clients; MPI_Info pub_global; std::cout << "This process rank is " << rank << std::endl; std::cout << "Number of current processes is " << numtasks << std::endl; char port_name[MPI_MAX_PORT_NAME]; mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name); MPI_Info_create(&pub_global); MPI_Info_set(pub_global, "ompi_global_scope", "true"); mpi_error = MPI_Publish_name("test_service", pub_global, port_name); if(mpi_error) { ... } std::cout << "Established port name is " << port_name << std::endl; std::ofstream file("./portfile",std::ofstream::trunc); file << port_name; file.close(); MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &remote_clients); The server error looks like this… [cid:image001.png@01D21E1B.D97A31E0] The client error look like so… [cid:image002.png@01D21E1B.D97A31E0] Thanks Rick From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Tuesday, October 04, 2016 7:13 AM To: Open MPI Users Subject: Re: [OMPI users] problems with client server scenario using MPI_Comm_connect Rick, I do not think ompi_server is required here. Can you please post a trimmed version of your client and server, and your two mpirun command lines. You also need to make sure all ranks have the same root parameter when invoking MPI_Comm_accept and MPI_Comm_connect Cheers, Gilles "Marlborough, Rick" > wrote: Folks; I have been trying to get a test case up and running using a client server scenario with a server waiting on MPI_Comm_accept and the client trying to connect via MPI_Comm_connect. The port value is written to a file. The client opens the file and reads the port value. I run the server, followed by the client. They both appear to sit there for a time, but eventually they both timeout and abort. They are both running a separate machines. All other communications between these 2 machines appears to be OK. Is there some intermediate service that needs to be run? I am using OpenMP
Re: [OMPI users] problems with client server scenario using MPI_Comm_connect
Rick, v2.0.x uses a 60 seconds hard coded timeout (vs 600 seconds in master) in ompi/dpm/dpm.c, see OPAL_PMIX_EXCHANGE I will check your test and likely have the value bumped to 600 seconds Cheers, Gilles On Tuesday, October 4, 2016, Marlborough, Rick wrote: > Gilles; > > The abort occurs somewhere between 30 and 60 seconds. Is > there some configuration setting that could influence this? > > > > Rick > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org > ] *On > Behalf Of *Gilles Gouaillardet > *Sent:* Tuesday, October 04, 2016 8:39 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] problems with client server scenario using > MPI_Comm_connect > > > > Rick, > > > > How long does it take between the test fails ? > > There were a bug that caused a failure if no connection was received after > 2 (3?) seconds, but I think it was fixed in v2.0.1 > > That being said, you might want to try a nightly snapshot of the v2.0.x > branch > > > > Cheers, > > > > Gilles > > > On Tuesday, October 4, 2016, Marlborough, Rick > wrote: > > Gilles; > > Here is the client side code. The start command is “mpirun > –n 1 client 10” where 10 is used to size a buffer. > > > > int numtasks, rank, dest, source, rc, count, tag=1; > > MPI_Init(&argc,&argv); > > if(argc > 1) > > { > > bufsize = atoi(argv[1]); > > } > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks); > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > MPI_Comm server; > > if(1) > > { > > char port_name[MPI_MAX_PORT_NAME + 1]; > > > > std::ifstream file("./portfile"); > > file.getline(port_name,MPI_MAX_PORT_NAME) > ; > > file.close(); > > //Lookup_name does not work. > > //MPI_Lookup_name("test_service", > MPI_INFO_NULL, port_name); > > std::cout << "Established port name is " > << port_name << std::endl; > > MPI_Comm_connect(port_name, > MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server); > > MPI_Comm_remote_size(server,&num_procs); > > std::cout << "Number of running processes > is " << num_procs << std::endl; > > MPI_Finalize(); > > exit(0); > > } > > > > > > Here is the server code. This is started on a different machine. The > command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer. > > > > int numtasks, rank, dest, source, rc, count, tag=1; > > MPI_Init(&argc,&argv); > > if(argc > 1) > > { > > bufsize = atoi(argv[1]); > > } > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks); > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > MPI_Comm remote_clients; > > MPI_Info pub_global; > > > > std::cout << "This process rank is " << rank << std::endl; > > std::cout << "Number of current processes is " << numtasks > << std::endl; > > char port_name[MPI_MAX_PORT_NAME]; > > mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name); > > MPI_Info_create(&pub_global); > > MPI_Info_set(pub_global, "ompi_global_scope", "true"); > > mpi_error = MPI_Publish_name("test_service", pub_global, > port_name); > > if(mpi_error) > > { > > ... > > } > > std::cout << "Established port name is " << port_name << > std::endl; > > std::ofstream file("./portfile",std::ofstream::trunc); > > file << port_name; > > file.close(); > > MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, > MPI_COMM_WORLD, &remote_clients); > > > > > > > > The server error looks like this… > > > > > > > > The client error look like so… > > > > > > > > Thanks > > Rick > > *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Gilles > Gouaillardet > *Sent:* Tuesday, October 04, 2016 7:13 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] problems with client server scenario using > MPI_Comm_connect > > > > Rick, > > I do not think ompi_server is required here. > Can you please post a trimmed version of your client and server, and your > two mpirun command lines. > You also need to make sure all ranks have the same root parameter when > invoking MPI_Comm_accept and MPI_Comm_connect > > Cheers, > > Gilles > > "Marlborough, Rick" wrote: > > Folks; > >
[OMPI users] Question on using Github to see bugs fixed in past versions
Apologies for the dumb question... There used to be a way to dive in to see exactly what bugs and features came into 1.10.4, 1.10.3, and on back to 1.8.8. Is there a way to do that on github? Ed ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Question on using Github to see bugs fixed in past versions
Edwin, changes are summarized in the NEWS file we used to have two github repositories, and they were "merged" recently with github, you can list the closed PR for a given milestone https://github.com/open-mpi/ompi-release/milestones?state=closed then you can click on a milestone, and list the PR marked as "Closed" for example, the PR merged in v1.10.4 are at https://github.com/open-mpi/ompi-release/milestone/18?closed=1 Cheers, Gilles On 10/5/2016 5:10 AM, Blosch, Edwin L wrote: Apologies for the dumb question… There used to be a way to dive in to see exactly what bugs and features came into 1.10.4, 1.10.3, and on back to 1.8.8. Is there a way to do that on github? Ed ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users