On Jan 3, 2013, at 6:52 AM, Ake Sandgren <ake.sandg...@hpc2n.umu.se> wrote:
> On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote: >> On Jan 3, 2013, at 3:01 AM, Ake Sandgren <ake.sandg...@hpc2n.umu.se> wrote: >> >>> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote: >>>> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote: >>>>> Hi! >>>>> >>>>> The grpcomm component hier seems to have vanished between 1.6.1 and >>>>> 1.6.3. >>>>> Why? >>>>> It seems that the version of slurm we are using (not the latest at the >>>>> moment) is using it for startup. >> >> It should be using PMI if you are directly launching processes via srun, and >> should not be using hier any more. > > Shouldn't the grpcomm pmi component be turned on by default then, if it > is needed? It should be > >>>>> >>>> >>>> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier. >> >> Yes - that is the *only* scenario (a direct launch of procs via srun) that >> should use hier > > What i have in my submit file is: > #SBATCH -n x > > srun some-mpi-binary > > This fails since hier is missing. > > The reason one wants to use srun and not mpirun is getting slurms cgroup > containement. > >>> >>> orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier >>> grpcomm >> >> Something is very wrong if that is true. How was this configured, and how >> are you starting this job? > > Not sure if it actually tries to use hier at runtime, i just noticed > that it had a setenv OMPI_MCA_grpcomm=hier in the code. > > So what is the real problem here? Do you have PMI installed and running on your system? I think that is the source of the trouble - if PMI isn't running, then this will fail. > > configure line is: > ./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users