Re: [OMPI users] OpenMPI PMI2 with SLURM 14.03 not working [SOLVED]

2014-04-11 Thread Anthony Alba
Answered in the slurm-devel list: it is a bug in SLURM 14.03. The fix is already in HEAD and also will be in 14.03.1 https://groups.google.com/forum/#!topic/slurm-devel/1ctPkEn7TFI - Anthony

[OMPI users] OpenMPI PMI2 with SLURM 14.03 not working

2014-04-11 Thread Anthony Alba
Not sure if this is a SLURM or OMPI issue so please bear with the cross-posting... The OpenMPI FAQ mentions an issue with slurm 2.6.3/pmi2. https://www.open-mpi.org/faq/?category=slurm#slurm-2.6.3-issue I have built both 1.7.5/1.8 against slurm 14.03/pmi2. When I launch openmpi/examples/hello_c

Re: [OMPI users] Troubleshooting mpirun with tree spawn hang

2014-04-11 Thread Anthony Alba
he cmd line. > > > On Apr 10, 2014, at 9:50 PM, Anthony Alba > > > wrote: > > > > > Is there a way to troubleshoot > > plm_rsh_no_tree_spawn=true hang? > > > > I have a set of passwordless-ssh nodes, each node can ssh into any > other., i.e., >

[OMPI users] Troubleshooting mpirun with tree spawn hang

2014-04-11 Thread Anthony Alba
Is there a way to troubleshoot plm_rsh_no_tree_spawn=true hang? I have a set of passwordless-ssh nodes, each node can ssh into any other., i.e., for h1 in A B C D; do for h2 in A B C D; do ssh $h1 ssh $h2 hostname; done; done works perfectly. Generally tree spawn works, however there is one hos

[OMPI users] [SOLVED] Re: mca_coll_hcoll.so: undefined symbol hcoll_group_destroy_notify

2014-04-08 Thread Anthony Alba
The devel list has responded that this requires a later drop of hcoll than in MOFED 2.1-1.0.6. - Anthony On Apr 9, 2014 9:49 AM, "Anthony Alba" wrote: > This is a change from OMPI 1.7.4 to 1.7.5, 1.8: the symbol is not used in > MOFED 2.1-1.0.6 openmpi-1.7.4 (I rebuilt the MOF

Re: [OMPI users] mca_coll_hcoll.so: undefined symbol hcoll_group_destroy_notify

2014-04-08 Thread Anthony Alba
This is a change from OMPI 1.7.4 to 1.7.5, 1.8: the symbol is not used in MOFED 2.1-1.0.6 openmpi-1.7.4 (I rebuilt the MOFED RPM to enable hcoll). - Anthony

Re: [OMPI users] mca_coll_hcoll.so: undefined symbol hcoll_group_destroy_notify

2014-04-08 Thread Anthony Alba
gt; > > > Best, > > > > Josh > > > > *From:* users [mailto:users-boun...@open-mpi.org] *On Behalf Of *Anthony > Alba > *Sent:* Tuesday, April 08, 2014 4:53 AM > *To:* us...@open-mpi.org > *Subject:* [OMPI users] mca_coll_hcoll.so: undefined symbol > hcoll

[OMPI users] mca_coll_hcoll.so: undefined symbol hcoll_group_destroy_notify

2014-04-08 Thread Anthony Alba
Hi all, Ran into a problem running the openshmem examples/ using OpenMPI 1.8 compiled with --with-knem=/opt/knem-1.1.90mlnx2 --with-hcoll=/opt/mellanox/hcoll --with-mxm=/opt/mellanox/mxm --with-fca=/opt/mellanox/fca lib/openmpi/mca_coll_hcoll.so has undefined symbol hcoll_group_destroy_notify I