[OMPI users] Problem with MPI_Comm_accept in a dynamic client/server application
Hi there, I am trying to create a client/server application with OpenMPI, which has been installed on a Windows machine, by following the instruction (with CMake) in the README.WINDOWS file in the OpenMPI distribution (version 1.4.2). I have ran other test application that compile file under the Visual Studio 2008 Command Prompt. However I get the following errors on the server side when accepting a new client that is trying to connect: [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\orte\mca\grp comm\base\grpcomm_base_allgather.c at line 222 [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\orte\mca\grp comm\basic\grpcomm_basic_module.c at line 530 [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\ompi\mca\dpm \orte\dpm_orte.c at line 363 [Lazar:2716] *** An error occurred in MPI_Comm_accept [Lazar:2716] *** on communicator MPI_COMM_WORLD [Lazar:2716] *** MPI_ERR_INTERN: internal error [Lazar:2716] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) -- mpirun has exited due to process rank 0 with PID 476 on node Lazar exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- The server and client code is attached. I have straggled with this problem for quite a while, so please let me know what the issue might be. I have looked at the archives and the FAQ, and the only thing similar that I have found had to do with different version of OpenMPI installed, but I only have one version, and I believe it is the one being used. Thank you, Kalin #include "mpi.h" int main( int argc, char **argv ) { MPI_Comm client; MPI_Status status; char port_name[MPI_MAX_PORT_NAME]; double buf[100]; intsize, again; MPI_Init( &argc, &argv ); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Open_port(MPI_INFO_NULL, port_name); //printf("server available at %s\n",port_name); while (1) { MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client ); again = 1; while (again) { MPI_Recv( buf, 100, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG, client, &status ); switch (status.MPI_TAG) { case 0: MPI_Comm_free( &client ); MPI_Close_port(port_name); MPI_Finalize(); return 0; case 1: MPI_Comm_disconnect( &client ); again = 0; break; case 2: //printf("test"); default: /* Unexpected message type */ MPI_Abort( MPI_COMM_WORLD, 1 ); } } } } #include "mpi.h" int main( int argc, char **argv ) { MPI_Comm server; double buf[100]; char port_name[MPI_MAX_PORT_NAME]; MPI_Init( &argc, &argv ); strcpy(port_name, argv[1] );/* assume server's name is cmd-line arg */ MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server ); bool done = false; while (!done) { int tag = 2; /* Action to perform */ MPI_Send( buf, 100, MPI_DOUBLE, 0, tag, server ); /* etc */ } MPI_Send( buf, 0, MPI_DOUBLE, 0, 1, server ); MPI_Comm_disconnect( &server ); MPI_Finalize(); return 0; }
Re: [OMPI users] Problem with MPI_Comm_accept in a dynamic client/server application
Thank you for the quick response and I am looking forward to Shiqing's reply. Additionally, I noticed that I get the following warnings whenever I run an OpenMPI application. I am not sure if this has anything to do with the error that I am getting for MPI_Comm_accept: [Lazar:03288] mca_oob_tcp_create_listen: unable to disable v4-mapped addresses [Lazar:00576] mca_oob_tcp_create_listen: unable to disable v4-mapped addresses [Lazar:00576] mca_btl_tcp_create_listen: unable to disable v4-mapped addresses Kalin On 14.10.2010 г. 08:47, Jeff Squyres wrote: Just FYI -- the main Windows Open MPI guy (Shiqing) is out for a little while. He's really the best person to answer your question. I'm sure he'll reply when he can, but I just wanted to let you know that there may be some latency in his reply. On Oct 13, 2010, at 5:09 PM, Kalin Kanov wrote: Hi there, I am trying to create a client/server application with OpenMPI, which has been installed on a Windows machine, by following the instruction (with CMake) in the README.WINDOWS file in the OpenMPI distribution (version 1.4.2). I have ran other test application that compile file under the Visual Studio 2008 Command Prompt. However I get the following errors on the server side when accepting a new client that is trying to connect: [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\orte\mca\grp comm\base\grpcomm_base_allgather.c at line 222 [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\orte\mca\grp comm\basic\grpcomm_basic_module.c at line 530 [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\ompi\mca\dpm \orte\dpm_orte.c at line 363 [Lazar:2716] *** An error occurred in MPI_Comm_accept [Lazar:2716] *** on communicator MPI_COMM_WORLD [Lazar:2716] *** MPI_ERR_INTERN: internal error [Lazar:2716] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) -- mpirun has exited due to process rank 0 with PID 476 on node Lazar exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- The server and client code is attached. I have straggled with this problem for quite a while, so please let me know what the issue might be. I have looked at the archives and the FAQ, and the only thing similar that I have found had to do with different version of OpenMPI installed, but I only have one version, and I believe it is the one being used. Thank you, Kalin ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Problem with MPI_Comm_accept in a dynamic client/server application
Hi Shiqing, I must have missed your response among all the e-mails that get sent to the mailing list. Here are a little more details about the issues that I am having. My client/server programs seem to run sometimes, but then after a successful run I always seem to get the error that I included in my first post. The way that I run the programs is by running the server application first, which generates the port string, etc. I then proceed to run the client application with a new call to mpirun. After getting the errors that I e-mailed about I also tried to run ompi-clean, but the results are the following: >ompi-clean [Lazar:05984] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ..\..\orte\r untime\orte_init.c at line 125 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -- Any help with this issue will be greatly appreciated. Thank you, Kalin On 27.10.2010 г. 05:52, Shiqing Fan wrote: Hi Kalin, Sorry for the late reply. I checked the code and got confused. (I'm not and MPI expert) I'm just wondering how to start the server and client in the same mpirun command while the client needs a hand-input port name, which is given by the server at runtime. I found a similar program on the Internet (see attached), that works well on my Windows. In this program, the generated port name will be send among the processes by MPI_Send. Regards, Shiqing On 2010-10-13 11:09 PM, Kalin Kanov wrote: Hi there, I am trying to create a client/server application with OpenMPI, which has been installed on a Windows machine, by following the instruction (with CMake) in the README.WINDOWS file in the OpenMPI distribution (version 1.4.2). I have ran other test application that compile file under the Visual Studio 2008 Command Prompt. However I get the following errors on the server side when accepting a new client that is trying to connect: [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\orte\mca\grp comm\base\grpcomm_base_allgather.c at line 222 [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\orte\mca\grp comm\basic\grpcomm_basic_module.c at line 530 [Lazar:02716] [[47880,1],0] ORTE_ERROR_LOG: Not found in file ..\..\ompi\mca\dpm \orte\dpm_orte.c at line 363 [Lazar:2716] *** An error occurred in MPI_Comm_accept [Lazar:2716] *** on communicator MPI_COMM_WORLD [Lazar:2716] *** MPI_ERR_INTERN: internal error [Lazar:2716] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) -- mpirun has exited due to process rank 0 with PID 476 on node Lazar exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- The server and client code is attached. I have straggled with this problem for quite a while, so please let me know what the issue might be. I have looked at the archives and the FAQ, and the only thing similar that I have found had to do with different version of OpenMPI installed, but I only have one version, and I believe it is the one being used. Thank you, Kalin ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- -- Shiqing Fanhttp://www.hlrs.de/people/fan High Performance Computing Tel.: +49 711 685 87234 Center Stuttgart (HLRS)Fax.: +49 711 685 65832 Address:Allmandring 30 email:f...@hlrs.de 70569 Stuttgart
Re: [OMPI users] Problem with MPI_Comm_accept in a dynamic client/server application
Hi Shiqing, I am using OpenMPI version 1.4.2 Here is the output of ompi_info: Package: Open MPI Kalin Kanov@LAZAR Distribution Open MPI: 1.4.2 Open MPI SVN revision: r23093 Open MPI release date: May 04, 2010 Open RTE: 1.4.2 Open RTE SVN revision: r23093 Open RTE release date: May 04, 2010 OPAL: 1.4.2 OPAL SVN revision: r23093 OPAL release date: May 04, 2010 Ident string: 1.4.2 Prefix: C:/Program Files/openmpi-1.4.2/installed Configured architecture: x86 Windows-5.2 Configure host: LAZAR Configured by: Kalin Kanov Configured on: 18:00 04.10.2010 ?. Configure host: LAZAR Built by: Kalin Kanov Built on: 18:00 04.10.2010 ?. Built host: LAZAR C bindings: yes C++ bindings: yes Fortran77 bindings: no Fortran90 bindings: no Fortran90 bindings size: na C compiler: cl C compiler absolute: cl C++ compiler: cl C++ compiler absolute: cl Fortran77 compiler: CMAKE_Fortran_COMPILER-NOTFOUND Fortran77 compiler abs: none Fortran90 compiler: Fortran90 compiler abs: none C profiling: yes C++ profiling: yes Fortran77 profiling: no Fortran90 profiling: no C++ exceptions: no Thread support: no Sparse Groups: no Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: no Heterogeneous support: no mpirun default --prefix: yes MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol visibility support: yes FT Checkpoint support: yes (checkpoint thread: no) MCA backtrace: none (MCA v2.0, API v2.0, Component v1.4.2) MCA paffinity: windows (MCA v2.0, API v2.0, Component v1.4.2) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.2) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2) MCA timer: windows (MCA v2.0, API v2.0, Component v1.4.2) MCA installdirs: windows (MCA v2.0, API v2.0, Component v1.4.2) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.2) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.2) MCA crs: none (MCA v2.0, API v2.0, Component v1.4.2) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.2) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.2) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.2) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: self (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.2) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.2) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.2) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.2) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.2) MCA btl: self (MCA v2.0, API v2.0, Component v1.4.2) MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.2) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.2) MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.2) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.2) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.2) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.2) MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.2) MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.2) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.2) MCA odls: process (MCA v2.0, API v2.0, Component v1.4.2) MCA ras: ccp (MCA v2.0, API v2.0, Component v1.4.2) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.2) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.2) MCA rml: ftrm (MCA v2.0, API v2.0, Component v1.4.2) MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.2) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.2) MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.2) MCA plm: ccp (MCA v2.0, API v2.0, Component v1.4.2) MCA plm: process (MCA v2.0, API v2.0, Component v1.4.2) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.2) MCA ess: env (MCA v2.0, API v2.0, Component v1.4.2) MCA ess: hnp (MCA