Well, this is kind of interesting. I can strip the configure line
back and get mpirun to work on one node, but then neither srun nor
mpirun within a SLURM job will run. I can add back configure options
to get to
./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with
Ryan,
With srun it's fine. Only with mpirun is there a problem, and that is
both on a single node and on multiple nodes. SLURM was built against
pmix 2.0.2, and I am pretty sure that SLURM's default is pmix. We are
running a recent patch of SLURM, I think. SLURM and OMPI are both
being built u
I’m not entirely sure I understand what you are trying to do. The
PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx
server (i.e., the OMPI daemon on that node). This is always done over the
loopback device since it is a purely local connection that is never used for
Hello,
I want to force OpenMPI to use TCP and in particular use a particular subnet.
Unfortunately, I can't manage to do that.
Here is what I try:
$BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca ptl_tcp_remote_connections 1
--mca btl_tcp_if_include '10.233.0.0/19' -np 4 --oversubscribe -H
What MPI is SLURM set to use/how was that compiled? Out of the box, the SLURM
MPI is set to “none”, or was last I checked, and so isn’t necessarily doing
MPI. Now, I did try this with OpenMPI 2.1.1 and it looked right either way
(OpenMPI built with “--with-pmi"), but for MVAPICH2 this definitely
This is on an ARM processor? I suspect that is the root of the problems as we
aren’t seeing anything like this elsewhere.
> On Jun 18, 2018, at 1:27 PM, Bennet Fauber wrote:
>
> If it's of any use, 3.0.0 seems to hang at
>
> Making check in class
> make[2]: Entering directory `/tmp/build/open
If it's of any use, 3.0.0 seems to hang at
Making check in class
make[2]: Entering directory `/tmp/build/openmpi-3.0.0/test/class'
make ompi_rb_tree opal_bitmap opal_hash_table opal_proc_table
opal_tree opal_list opal_value_array opal_pointer_array opal_lifo
opal_fifo
make[3]: Entering directory
No such luck. If it matters, mpirun does seem to work with processes
on the local node that have no internal MPI code. That is,
[bennet@cavium-hpc ~]$ mpirun -np 4 hello
Hello, ARM
Hello, ARM
Hello, ARM
Hello, ARM
but it fails with a similar error if run while a SLURM job is active; i.e.,
[ben
I doubt Slurm is the issue. For grins, lets try adding “--mca plm rsh” to your
mpirun cmd line and see if that works.
> On Jun 18, 2018, at 12:57 PM, Bennet Fauber wrote:
>
> To eliminate possibilities, I removed all other versions of OpenMPI
> from the system, and rebuilt using the same build
To eliminate possibilities, I removed all other versions of OpenMPI
from the system, and rebuilt using the same build script as was used
to generate the prior report.
[bennet@cavium-hpc bennet]$ ./ompi-3.1.0bd.sh
Checking compilers and things
OMPI is ompi
COMP_NAME is gcc_7_1_0
SRC_ROOT is /sw/arc
Hmmm...well, the error has changed from your initial report. Turning off the
firewall was the solution to that problem.
This problem is different - it isn’t the orted that failed in the log you sent,
but the application proc that couldn’t initialize. It looks like that app was
compiled against
11 matches
Mail list logo