This is just a thought:

according to the system() man page, 'SIGCHLD' is blocked during the
execution of the program. Since you are executing your command as a
daemon in the background, it will be permanently blocked.

Does OpenMPI daemon depend on SIGCHLD in any way? That is about the
only difference that I can think of between running the command
stand-alone (which works) and running via a system() API call (that
does not work).

Best
Durga


On Thu, Jan 19, 2012 at 9:52 AM, Jeff Squyres <jsquy...@cisco.com> wrote:
> Which network transport are you using, and what version of Open MPI are you 
> using?  Do you have OpenFabrics support compiled into your Open MPI 
> installation?
>
> If you're just using TCP and/or shared memory, I can't think of a reason 
> immediately as to why this wouldn't work, but there may be a subtle 
> interaction in there somewhere that causes badness (e.g., memory corruption).
>
>
> On Jan 19, 2012, at 1:57 AM, Randolph Pullen wrote:
>
>>
>> I have a section in my code running in rank 0 that must start a perl program 
>> that it then connects to via a tcp socket.
>> The initialisation section is shown here:
>>
>>     sprintf(buf, "%s/session_server.pl -p %d &", PATH,port);
>>     int i = system(buf);
>>     printf("system returned %d\n", i);
>>
>>
>> Some time after I run this code, while waiting for the data from the perl 
>> program, the error below occurs:
>>
>> qplan connection
>> DCsession_fetch: waiting for Mcode data...
>> [dc1:05387] [[40050,1],0] ORTE_ERROR_LOG: A message is attempting to be sent 
>> to a process whose contact information is unknown in file rml_oob_send.c at 
>> line 105
>> [dc1:05387] [[40050,1],0] could not get route to [[INVALID],INVALID]
>> [dc1:05387] [[40050,1],0] ORTE_ERROR_LOG: A message is attempting to be sent 
>> to a process whose contact information is unknown in file 
>> base/plm_base_proxy.c at line 86
>>
>>
>> It seems that the linux system() call is breaking OpenMPI internal 
>> connections.  Removing the system() call and executing the perl code 
>> externaly fixes the problem but I can't go into production like that as its 
>> a security issue.
>>
>> Any ideas ?
>>
>> (environment: OpenMPI 1.4.1 on kernel Linux dc1 
>> 2.6.18-274.3.1.el5.028stab094.3  using TCP and mpirun)
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to