Answered in the slurm-devel list: it is a bug in SLURM 14.03.
The fix is already in HEAD and also will be in 14.03.1
https://groups.google.com/forum/#!topic/slurm-devel/1ctPkEn7TFI
- Anthony
Not sure if this is a SLURM or OMPI issue so please bear with the
cross-posting...
The OpenMPI FAQ mentions an issue with slurm 2.6.3/pmi2.
https://www.open-mpi.org/faq/?category=slurm#slurm-2.6.3-issue
I have built both 1.7.5/1.8 against slurm 14.03/pmi2.
When I launch openmpi/examples/hello_c
he cmd line.
>
>
> On Apr 10, 2014, at 9:50 PM, Anthony Alba
> >
> wrote:
>
> >
> > Is there a way to troubleshoot
> > plm_rsh_no_tree_spawn=true hang?
> >
> > I have a set of passwordless-ssh nodes, each node can ssh into any
> other., i.e.,
>
Is there a way to troubleshoot
plm_rsh_no_tree_spawn=true hang?
I have a set of passwordless-ssh nodes, each node can ssh into any other.,
i.e.,
for h1 in A B C D; do for h2 in A B C D; do ssh $h1 ssh $h2 hostname; done;
done
works perfectly.
Generally tree spawn works, however there is one hos
The devel list has responded that this requires a later drop of hcoll than
in MOFED 2.1-1.0.6.
- Anthony
On Apr 9, 2014 9:49 AM, "Anthony Alba" wrote:
> This is a change from OMPI 1.7.4 to 1.7.5, 1.8: the symbol is not used in
> MOFED 2.1-1.0.6 openmpi-1.7.4 (I rebuilt the MOF
This is a change from OMPI 1.7.4 to 1.7.5, 1.8: the symbol is not used in
MOFED 2.1-1.0.6 openmpi-1.7.4 (I rebuilt the MOFED RPM to enable hcoll).
- Anthony
gt;
>
>
> Best,
>
>
>
> Josh
>
>
>
> *From:* users [mailto:users-boun...@open-mpi.org] *On Behalf Of *Anthony
> Alba
> *Sent:* Tuesday, April 08, 2014 4:53 AM
> *To:* us...@open-mpi.org
> *Subject:* [OMPI users] mca_coll_hcoll.so: undefined symbol
> hcoll
Hi all,
Ran into a problem running the openshmem examples/ using OpenMPI 1.8
compiled with
--with-knem=/opt/knem-1.1.90mlnx2 --with-hcoll=/opt/mellanox/hcoll
--with-mxm=/opt/mellanox/mxm
--with-fca=/opt/mellanox/fca
lib/openmpi/mca_coll_hcoll.so has undefined symbol
hcoll_group_destroy_notify
I