Sorry all,
Chris S over on the slurm list spotted it right away. I didn't have the
MpiDefault set to pmix_v2.
I can confirm that Ubuntu 18.04, gcc-7.3, openmpi-3.1.0, pmix-2.1.1, and
slurm-17.11.5 seem to work well together.
Sorry for the bother.
__
I have openmpi-3.0.1, pmix-1.2.4, and slurm-17.11.5 working well on a few
clusters. For things like:
bill@headnode:~/src/relay$ srun -N 2 -n 2 -t 1 ./relay 1
c7-18 c7-19
size= 1, 16384 hops, 2 nodes in 0.03 sec ( 2.00 us/hop) 1953 KB/sec
I've been having a tougher time trying to get
On 10/22/2014 12:37 AM, r...@q-leap.de wrote:
>>>>>> "Bill" == Bill Broadley writes:
>
> It seems the half-life period of knowledge on the list has decayed to
> two weeks on the list :)
>
> I've commented in detail on this (non-)issue on 2014-08-
On 10/21/2014 05:38 PM, Gus Correa wrote:
> Hi Bill
>
> I have 2.6.X CentOS stock kernel.
Heh, wow, quite a blast from the past.
> I set both parameters.
> It works.
Yes, for kernels that old I had it working fine.
> Maybe the parameter names may changed in 3.X kernels?
> (Which is really bad
On 10/21/2014 04:18 PM, Gus Correa wrote:
> Hi Bill
>
> Maybe you're missing these settings in /etc/modprobe.d/mlx4_core.conf ?
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
Ah, that helped. Although:
/lib/modules/3.13.0-36-generic/kernel/drivers/net/ethernet/mellanox/mlx
I've setup several clusters over the years with OpenMPI. I often get the below
error:
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.
A rather stable production code that has worked with various versions of MPI
on various architectures started hanging with gcc-4.4.2 and openmpi 1.3.33
Which lead me to this thread.
I made some very small changes to Eugene's code, here's the diff:
$ diff testorig.c billtest.c
3,5c3,4
<
< #define
Jeff Squyres wrote:
Sorry for the delay in replying.
What exactly is the relay program timing? Can you run a standard
benchmark like NetPIPE, perchance? (http://www.scl.ameslab.gov/netpipe/)
It gives very similar numbers to osu_latency. Turns out the mca btl seems to
be completely ignor
I built openib-1.2.6 on centos-5.2 with gcc-4.3.1.
I did a tar xvzf, cd openib-1.2.6, mkdir obj, cd obj:
(I put gcc-4.3.1/bin first in my path)
../configure --prefix=/opt/pkg/openmpi-1.2.6 --enable-shared --enable-debug
If I look in config.log I see:
MCA_btl_ALL_COMPONENTS=' self sm gm mvapi mx