This should handle modifying his LD_LIBRARY_PATH correctly
but doing a intel setvars.sh
How are you doing 'intel setvars.sh'? I believe you need to source that
rather than execute it. Also, there might be other files you need to
source. I have access to 2019.u3, and in the install_root, I
That error message is right in your original post, and I didn't even see
it:
configure:6541: icc -O2 conftest.c >&5
ld: cannot find -lstdc++
Well, that's an easy fix.
I guess my eyes stopped when I got to this:
configure: error: C compiler cannot create executables See `config.log' for
No, icc is there:
configure:6488: icc -qversion >&5
icc: command line warning #10006: ignoring unknown option '-qversion'
icc: command line error: no files specified; for help type "icc -help"
Those error messages are coming directly from icc.
Prentice
On 3/30/21 12:52 PM, Heinz, Michael Wil
Is this your own Linux system, or a work/school system? Some security
guidelines, like the CIS Security benchmarks, recommending making /tmp
its own filesystem and mount it with the 'noexec' option. That can cause
this error. The configure script works by seeing if it can compile
and/or run sma
OpenMPI should only be using shared memory on the local host
automatically, but maybe you need to force it.
I think
mpirun -mca btl self,vader ...
should do that.
or you can exclude tcp instead
mpirun -mca btl ^tcp
See
https://www.open-mpi.org/faq/?category=sm
for more info.
Prentice
On
If you disable it with -mtl ^openib the warning goes away.
And the performance of openib goes away right along with it.
Prentice
On 3/13/21 5:43 PM, Heinz, Michael William via users wrote:
I’ve begun getting this annoyingly generic warning, too. It appears to be
coming from the openib provi
already deleted).
On Nov 12, 2020, at 8:16 AM, Prentice Bisbal via users
wrote:
I should give more background. In the slurm error log for this job, there was
another error about a memcpy operation failing listed first, so that caused the
job to fail. I suspect these errors below are the result o
ov 11, 2020, at 10:03 AM, Prentice Bisbal via users
wrote:
One of my users recently reported a failed job that was using OpenMPI 4.0.4
compiled with PGI 20.4. There two different errors reported. One was reported
once, and I think had nothing to do with OpenMPI or PMIX, and then this error
One of my users recently reported a failed job that was using OpenMPI
4.0.4 compiled with PGI 20.4. There two different errors reported. One
was reported once, and I think had nothing to do with OpenMPI or PMIX,
and then this error was repeated multiple times in the Slurm error
output for the
Could SELinux or AppArmor be active by default for a new install and
be causing this problem?
Prentice
On 10/21/20 12:22 PM, Jorge SILVA via users wrote:
Hello Gus,
Thank you for your answer.. Unfortunately my problem is much more
basic. I didn't try to run the program in both computers
If you want to continue this conversation in a more appropriate forum,
may I recommend the Beowulf mailing list? Discussing *anything*
HPC-related is fair game there. It's a low-volume list, but the
conversation can get quite lively sometimes.
https://www.beowulf.org/mailman/listinfo/beowulf
The reason there aren't a lot of "new" books on MPI programming is
because the standard is pretty stable and the paradigm hasn't really
changed since the first version of the standard came out in the mid-90s.
I believe newer versions of the MPI standard have added new features,
but haven't real
If P=1 and Q=1, your setting up a 1x1 matrix which should only need a
single processor. Something tells me you have 4 independent HPL jobs
running, rather than one job using 4 threads. I think you should have
2x2 grid if you want to use 4 threads. For HPL, P * Q = number of cores
being used.
Okay, I got this fixed. Apparently, 'make install' wasn't overwriting
the previous install, so I had to manually delete my previous install
before doing 'make install'. Once I did that, using UCX 1.8.1 and
specifying --without-verbs worked.
Prentice
On 7/28/20 2:03 PM, Prentice Bisbal wrote:
One more bit of information: These are QLogic IB cards, not Mellanox:
$ lspci | grep QL
05:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)
On 7/28/20 2:03 PM, Prentice Bisbal wrote:
Last week I posted on here that I was getting immediate segfaults when
I ran MPI programs, an
Last week I posted on here that I was getting immediate segfaults when I
ran MPI programs, and the system logs shows that the segfaults were
occuring in libibverbs.so, and that the problem was still occurring even
if I specified '-mca btl ^openib'.
Since then, I've made a lot of progress on th
I've been doing a lot of research on this issue (See my next e-mail on
this topic which I'll be posting ina few minutes), and OpenMPI will use
ibverbs or UCX. In OpenMPI 4.0 and later, ibverbs is deprecated in favor
of UCX.
Prentice
On 7/27/20 7:49 PM, gil...@rist.or.jp wrote:
Prentice,
ib
Can anyone explain why my job still calls libibverbs when I run it with
'-mca btl ^openib'?
If I instead use '-mca btl tcp', my jobs don't segfault. I would assum
'mca btl ^openib' and '-mca btl tcp' to essentially be equivalent, but
there's obviously a difference in the two.
Prentice
On 7/
I manage a cluster that is very heterogeneous. Some nodes have
InfiniBand, while others have 10 Gb/s Ethernet. We recently upgraded to
CentOS 7, and built a new software stack for CentOS 7. We are using
OpenMPI 4.0.3, and we are using Slurm 19.05.5 as our job scheduler.
We just noticed that wh
Jeff,
Then you'll be happy to know I've been building OpenMPI for years and I
never had any complaints about your configure/build system. Of course,
I'm a pro who gets paid to build open-source software all day long, but
I have to say I've never had any issues with configure, make, or 'make
c
flags to restrict
generated instructions, too, but it can be difficult to track down the precise
flags that you need.
On Jul 2, 2020, at 10:22 AM, Prentice Bisbal via users
wrote:
I manage a very heterogeneous cluster. I have nodes of different ages with
different processors, different amou
I manage a very heterogeneous cluster. I have nodes of different ages
with different processors, different amounts of RAM, etc. One user is
reporting that on certain nodes, his jobs keep crashing with the errors
below. His application is using OpenMPI 1.10.3, which I know is an
ancient version
Thanks. I'm going to give this solution a try.
On 5/9/20 9:51 AM, Patrick Bégou via users wrote:
Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit :
We often get the following errors when more than one job runs on the
same compute node. We are using Slurm with OpenMPI. The IB
I believe they're DDR cards.
On 5/9/20 6:36 AM, Heinz, Michael William via users wrote:
Prentice,
Avoiding the obvious question of whether your FM is running and the fabric is
in an active state, It sounds like your exhausting a resource on the cards.
Ralph is correct about support for QLogic
We often get the following errors when more than one job runs on the
same compute node. We are using Slurm with OpenMPI. The IB cards are
QLogic using PSM:
10698ipath_userinit: assign_context command failed: Network is down
node01.10698can't open /dev/ipath, network down (err=26)
node01.10703ip
i and slurm against same pmix? That is - first build pmix, than
build slurm with-pmix, and than ompi with both slurm and pmix=external ?
On 23/04/2020 17:00, Prentice Bisbal via users wrote:
$ ompi_info | grep slurm
Configure command line:
'--prefix=/usr/pppl/intel/2019-pkgs/ope
ompi_info |& grep "MPI repo"
to confirm that all nodes are running the same version of OMPI.
*From:*users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of
*Prentice Bisbal via users
*Sent:* Monday, April 27, 2020 10:25 AM
*To:* users@lists.open-mpi.org
*Cc:* Prentice Bi
l Message-----
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Prentice Bisbal via users
Sent: Friday, April 24, 2020 2:19 PM
To: Ralph Castain mailto:r...@open-mpi.org>>; Open
MPI Users mailto:users@lists.open-mpi.org>>
Cc: Prentice Bisbal mailto:pbis...@pppl.gov>&g
OMPI problems?
Andy
-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Prentice
Bisbal via users
Sent: Friday, April 24, 2020 2:19 PM
To: Ralph Castain ; Open MPI Users
Cc: Prentice Bisbal
Subject: Re: [OMPI users] [External] Re: Can't start jobs with srun.
O
On Apr 23, 2020, at 11:59 AM, Prentice Bisbal via users
wrote:
--mpi=list shows pmi2 and openmpi as valid values, but if I set --mpi= to
either of them, my job still fails. Why is that? Can I not trust the output of
--mpi=list?
Prentice
On 4/23/20 10:43 AM, Ralph Castain via users wrote
if that is what
you are going to use. In this case, you need to configure OMPI
--with-pmi2=
You can leave off the path if Slurm (i.e., just "--with-pmi2") was installed in
a standard location as we should find it there.
On Apr 23, 2020, at 7:39 AM, Prentice Bisbal via users
support? Did you tell srun to use it?
On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users
wrote:
I'm using OpenMPI 4.0.3 with Slurm 19.05.5 I'm testing the software with a
very simple hello, world MPI program that I've used reliably for years. When I
submit the job throug
I'm using OpenMPI 4.0.3 with Slurm 19.05.5 I'm testing the software
with a very simple hello, world MPI program that I've used reliably for
years. When I submit the job through slurm and use srun to launch the
job, I get these errors:
*** An error occurred in MPI_Init
*** on a NULL communicat
Raymond,
Thanks for the info. Since we are still at CentOS 6, that is most likely
the problem.
Prentice
On 1/8/20 8:52 PM, Raymond Muno via users wrote:
AMD, list the minimum supported kernel for EPYC/NAPLES as RHEL/Centos
kernel 3.10-862, which is RHEL/CentOS 7.5 or later. Upgraded kernels
On 1/8/20 3:30 PM, Brice Goglin via users wrote:
Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :
We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages
We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages:
--
WARNING: a request was made to bind a process.
OpenMPI Users,
Are any of you using hwloc on Power9 hardware, specifically the IBM
AC922 servers? If so, have you encountered any issues? I checked the
documentation for the latest version (2.03), and found this:
Since it uses standard Operating System information, hwloc's support
is mostly
Sylvain,
I just ran into the same exact errors when compiling OpenMPI 3.0.0 with
PGI 18.3
Prentice
On 5/1/17 2:58 PM, Sylvain Jeaugey wrote:
I also saw IBM and ignored the email :-)
Thanks for reporting the issue, I passed it to the PGI team.
On 05/01/2017 11:49 AM, Prentice Bisbal wrote:
38 matches
Mail list logo