On Jan 3, 2013, at 6:52 AM, Ake Sandgren <ake.sandg...@hpc2n.umu.se> wrote:

> On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
>> On Jan 3, 2013, at 3:01 AM, Ake Sandgren <ake.sandg...@hpc2n.umu.se> wrote:
>> 
>>> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
>>>> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
>>>>> Hi!
>>>>> 
>>>>> The grpcomm component hier seems to have vanished between 1.6.1 and
>>>>> 1.6.3.
>>>>> Why?
>>>>> It seems that the version of slurm we are using (not the latest at the
>>>>> moment) is using it for startup.
>> 
>> It should be using PMI if you are directly launching processes via srun, and 
>> should not be using hier any more.
> 
> Shouldn't the grpcomm pmi component be turned on by default then, if it
> is needed?

It should be

> 
>>>>> 
>>>> 
>>>> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.
>> 
>> Yes - that is the *only* scenario (a direct launch of procs via srun) that 
>> should use hier
> 
> What i have in my submit file is:
> #SBATCH -n x
> 
> srun some-mpi-binary
> 
> This fails since hier is missing.
> 
> The reason one wants to use srun and not mpirun is getting slurms cgroup
> containement.
> 
>>> 
>>> orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
>>> grpcomm
>> 
>> Something is very wrong if that is true. How was this configured, and how 
>> are you starting this job?
> 
> Not sure if it actually tries to use hier at runtime, i just noticed
> that it had a setenv OMPI_MCA_grpcomm=hier in the code.
> 
> So what is the real problem here?

Do you have PMI installed and running on your system? I think that is the 
source of the trouble - if PMI isn't running, then this will fail.


> 
> configure line is:
> ./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to