Ah - not good. It is clearly a programming error. I'll have to review the other 
launchers and consult the others in the project to decide on the proper course 
of action.

Thanks

On Nov 17, 2009, at 1:49 PM, David Singleton wrote:

> 
> Hi Ralph,
> 
> Now I'm in a quandry - if I show you that its actually Open MPI that is
> propagating the environment then you are likely to "fix it" and then tm
> users will lose a nice feature.  :-)
> 
> Can I suggest that "least surprise" would require that MPI tasks get
> exactly the same environment/limits/... as mpirun so that "mpirun a.out"
> behaves just like "a.out".  [Following this principle we modified
> tm_spawn to propagate the callers rlimits to the spawned tasks.]
> A comment in orterun.c (see below) below suggests that Open MPI is trying
> to distinguish between "local" and "remote" processes.  I would have
> thought that distinction should be invisible to users as much as possible
> - a user asking for 4 cpus would like to see the same behaviour if all
> 4 are local or "2 local, 2 remote".
> 
> As to why tm does "The Right Thing": in the case of rsh/ssh the full
> mpirun environment is given to the rsh/ssh process locally while in the tm
> case it is an argument to tm_spawn and so gets given to the process (in
> this case orted) being launched remotely. Relevant lines from 1.3.3 below.
> PBS just passes along the environment it is told to.  We dont use torque
> but as of 2.3.3, it was still the same as OpenPBS in this respect.
> 
> Michael just pointed out the slight flaw.  The environment should be
> somewhat selectively propagated (exclude HOSTNAME etc).  I guess if you
> were to "fix" plm_tm_module I would put the propagation behaviour in
> tm_spawn and try to handle these exceptional cases.
> 
> Cheers,
> David
> 
> 
> orterun.c:
> 
>    510     /* save the environment for launch purposes. This MUST be
>    511      * done so that we can pass it to any local procs we
>    512      * spawn - otherwise, those local procs won't see any
>    513      * non-MCA envars were set in the enviro prior to calling
>    514      * orterun
>    515      */
>    516     orte_launch_environ = opal_argv_copy(environ);
> 
> 
> plm_rsh_module.c:
> 
>    681 /* actually ssh the child */
>    682 static void ssh_child(int argc, char **argv,
>    683                       orte_vpid_t vpid, int proc_vpid_index)
>    684 {
> 
>    694     /* setup environment */
>    695     env = opal_argv_copy(orte_launch_environ);
> 
>    766     execve(exec_path, exec_argv, env);
> 
> 
> plm_tm_module.c:
> 
>    128 static int plm_tm_launch_job(orte_job_t *jdata)
>    129 {
> 
>    228     /* setup environment */
>    229     env = opal_argv_copy(orte_launch_environ);
> 
>    311     rc = tm_spawn(argc, argv, env, node->launch_id, tm_task_ids + 
> launched, tm_events + launched);
> 
> 
> 
> Ralph Castain wrote:
>> Not exactly. It completely depends on how Torque was setup - OMPI isn't 
>> forwarding the environment. Torque is.
>> We made a design decision at the very beginning of the OMPI project not to 
>> forward non-OMPI envars unless directed to do so by the user. I'm afraid I 
>> disagree with Michael's claim that other MPIs do forward them - yes, MPICH 
>> does, but not all others do.
>> The world is bigger than MPICH and OMPI :-)
>> Since there is inconsistency in this regard between MPIs, we chose not to 
>> forward. Reason was simple: there is no way to know what is safe to forward 
>> vs what is not (e.g., what to do with DISPLAY), nor what the underlying 
>> environment is trying to forward vs what it isn't. It is very easy to get 
>> cross-wise and cause totally unexpected behavior, as users have complained 
>> about for years.
>> First, if you are using a managed environment like Torque, we recommend that 
>> you work with your sys admin to decide how to configure it. This is the best 
>> way to resolve a problem.
>> Second, if you are not using a managed environment and/or decide not to have 
>> that environment do the forwarding, you can tell OMPI to forward the envars 
>> you need by specifying them via the -x cmd line option. We already have a 
>> request to expand this capability, and I will be doing so as time permits. 
>> One option I'll be adding is the reverse of -x - i.e., "forward all envars 
>> -except- the specified one(s)".
>> HTH
>> ralph


Reply via email to