This is just a thought: according to the system() man page, 'SIGCHLD' is blocked during the execution of the program. Since you are executing your command as a daemon in the background, it will be permanently blocked.
Does OpenMPI daemon depend on SIGCHLD in any way? That is about the only difference that I can think of between running the command stand-alone (which works) and running via a system() API call (that does not work). Best Durga On Thu, Jan 19, 2012 at 9:52 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > Which network transport are you using, and what version of Open MPI are you > using? Do you have OpenFabrics support compiled into your Open MPI > installation? > > If you're just using TCP and/or shared memory, I can't think of a reason > immediately as to why this wouldn't work, but there may be a subtle > interaction in there somewhere that causes badness (e.g., memory corruption). > > > On Jan 19, 2012, at 1:57 AM, Randolph Pullen wrote: > >> >> I have a section in my code running in rank 0 that must start a perl program >> that it then connects to via a tcp socket. >> The initialisation section is shown here: >> >> sprintf(buf, "%s/session_server.pl -p %d &", PATH,port); >> int i = system(buf); >> printf("system returned %d\n", i); >> >> >> Some time after I run this code, while waiting for the data from the perl >> program, the error below occurs: >> >> qplan connection >> DCsession_fetch: waiting for Mcode data... >> [dc1:05387] [[40050,1],0] ORTE_ERROR_LOG: A message is attempting to be sent >> to a process whose contact information is unknown in file rml_oob_send.c at >> line 105 >> [dc1:05387] [[40050,1],0] could not get route to [[INVALID],INVALID] >> [dc1:05387] [[40050,1],0] ORTE_ERROR_LOG: A message is attempting to be sent >> to a process whose contact information is unknown in file >> base/plm_base_proxy.c at line 86 >> >> >> It seems that the linux system() call is breaking OpenMPI internal >> connections. Removing the system() call and executing the perl code >> externaly fixes the problem but I can't go into production like that as its >> a security issue. >> >> Any ideas ? >> >> (environment: OpenMPI 1.4.1 on kernel Linux dc1 >> 2.6.18-274.3.1.el5.028stab094.3 using TCP and mpirun) >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users