Re: [OMPI users] [EXTERNAL] Re: Newbie With Issues

2021-03-30 Thread Prentice Bisbal via users
This should handle modifying his LD_LIBRARY_PATH correctly but doing a intel setvars.sh How are you doing 'intel setvars.sh'? I believe you need to source that rather than execute it. Also, there might be other files you need to source. I have access to 2019.u3, and in the install_root, I

Re: [OMPI users] [External] Re: Newbie With Issues

2021-03-30 Thread Prentice Bisbal via users
That error message is right in your original post, and I didn't even see it: configure:6541: icc -O2 conftest.c >&5 ld: cannot find -lstdc++ Well, that's an easy fix. I guess my eyes stopped when I got to this: configure: error: C compiler cannot create executables See `config.log' for

Re: [OMPI users] [External] Re: Newbie With Issues

2021-03-30 Thread Prentice Bisbal via users
No, icc is there: configure:6488: icc -qversion >&5 icc: command line warning #10006: ignoring unknown option '-qversion' icc: command line error: no files specified; for help type "icc -help" Those error messages are coming directly from icc. Prentice On 3/30/21 12:52 PM, Heinz, Michael Wil

Re: [OMPI users] [External] Newbie With Issues

2021-03-30 Thread Prentice Bisbal via users
Is this your own Linux system, or a work/school system? Some security guidelines, like the CIS Security benchmarks, recommending making /tmp its own filesystem and mount it with the 'noexec' option. That can cause this error. The configure script works by seeing if it can compile and/or run sma

Re: [OMPI users] [External] Help with MPI and macOS Firewall

2021-03-18 Thread Prentice Bisbal via users
OpenMPI should only be using shared memory on the local host automatically, but maybe you need to force it. I think mpirun -mca btl self,vader ... should do that. or you can exclude tcp instead mpirun -mca btl ^tcp See https://www.open-mpi.org/faq/?category=sm for more info. Prentice On

Re: [OMPI users] [External] Re: Error intialising an OpenFabrics device.

2021-03-18 Thread Prentice Bisbal via users
If you disable it with -mtl ^openib the warning goes away. And the performance of openib goes away right along with it. Prentice On 3/13/21 5:43 PM, Heinz, Michael William via users wrote: I’ve begun getting this annoyingly generic warning, too. It appears to be coming from the openib provi

Re: [OMPI users] [External] Re: mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)

2020-11-12 Thread Prentice Bisbal via users
already deleted). On Nov 12, 2020, at 8:16 AM, Prentice Bisbal via users wrote: I should give more background. In the slurm error log for this job, there was another error about a memcpy operation failing listed first, so that caused the job to fail. I suspect these errors below are the result o

Re: [OMPI users] [External] Re: mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)

2020-11-12 Thread Prentice Bisbal via users
ov 11, 2020, at 10:03 AM, Prentice Bisbal via users wrote: One of my users recently reported a failed job that was using OpenMPI 4.0.4 compiled with PGI 20.4. There two different errors reported. One was reported once, and I think had nothing to do with OpenMPI or PMIX, and then this error

[OMPI users] mpi/pmix: ERROR: Error handler invoked: status = -25: No such file or directory (2)

2020-11-11 Thread Prentice Bisbal via users
One of my users recently reported a failed job that was using OpenMPI 4.0.4 compiled with PGI 20.4. There  two different errors reported. One was reported once, and I think had nothing to do with OpenMPI or PMIX, and then this error was repeated multiple times in the Slurm error output for the

Re: [OMPI users] [External] Re: mpirun on Kubuntu 20.4.1 hangs

2020-10-22 Thread Prentice Bisbal via users
 Could SELinux or AppArmor be active by default for a new install and be causing this problem? Prentice On 10/21/20 12:22 PM, Jorge SILVA via users wrote: Hello Gus,  Thank you for your answer..  Unfortunately my problem is much more basic. I  didn't try to run the program in both computers

Re: [OMPI users] [External] Re: MPI is still dominantparadigm?

2020-08-07 Thread Prentice Bisbal via users
If you want to continue this conversation in a more appropriate forum, may I recommend the Beowulf mailing list? Discussing *anything* HPC-related is fair game there. It's a low-volume list, but the conversation can get quite lively sometimes. https://www.beowulf.org/mailman/listinfo/beowulf

Re: [OMPI users] [External] Books/resources to learn (open)MPI from

2020-08-06 Thread Prentice Bisbal via users
The reason there aren't a lot of "new" books on MPI programming is because the standard is pretty stable and the paradigm hasn't really changed since the first version of the standard came out in the mid-90s. I believe newer versions of the MPI standard have added new features, but haven't real

Re: [OMPI users] [External] Correct mpirun Options for Hybrid OpenMPI/OpenMP

2020-08-03 Thread Prentice Bisbal via users
If P=1 and Q=1, your setting up a 1x1 matrix which should only need a single processor. Something tells me you have 4 independent HPL jobs running, rather than one job using 4 threads. I think you should have 2x2 grid if you want to use 4 threads. For HPL, P * Q = number of cores being used.

Re: [OMPI users] WARNING: There was an error initializing an OpenFabrics device

2020-07-29 Thread Prentice Bisbal via users
Okay, I got this fixed. Apparently, 'make install' wasn't overwriting the previous install, so I had to manually delete my previous install before doing 'make install'. Once I did that, using UCX 1.8.1 and specifying --without-verbs worked. Prentice On 7/28/20 2:03 PM, Prentice Bisbal wrote:

Re: [OMPI users] WARNING: There was an error initializing an OpenFabrics device

2020-07-28 Thread Prentice Bisbal via users
One more bit of information: These are QLogic IB cards, not Mellanox: $ lspci | grep QL 05:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02) On 7/28/20 2:03 PM, Prentice Bisbal wrote: Last week I posted on here that I was getting immediate segfaults when I ran MPI programs, an

[OMPI users] WARNING: There was an error initializing an OpenFabrics device

2020-07-28 Thread Prentice Bisbal via users
Last week I posted on here that I was getting immediate segfaults when I ran MPI programs, and the system logs shows that the segfaults were occuring in libibverbs.so, and that the problem was still occurring even if I specified '-mca btl ^openib'. Since then, I've made a lot of progress on th

Re: [OMPI users] [External] Re: segfault in libibverbs.so

2020-07-28 Thread Prentice Bisbal via users
I've been doing a lot of research on this issue (See my next e-mail on this topic which I'll be posting ina  few minutes), and OpenMPI will use ibverbs or UCX. In OpenMPI 4.0 and later, ibverbs is deprecated in favor of UCX. Prentice On 7/27/20 7:49 PM, gil...@rist.or.jp wrote: Prentice, ib

Re: [OMPI users] segfault in libibverbs.so

2020-07-27 Thread Prentice Bisbal via users
Can anyone explain why my job still calls libibverbs when I run it with '-mca btl ^openib'? If I instead use '-mca btl tcp', my jobs don't segfault. I would assum 'mca btl ^openib' and '-mca btl tcp' to essentially be equivalent, but there's obviously a difference in the two. Prentice On 7/

[OMPI users] segfault in libibverbs.so

2020-07-23 Thread Prentice Bisbal via users
I manage a cluster that is very heterogeneous. Some nodes have InfiniBand, while others have 10 Gb/s Ethernet. We recently upgraded to CentOS 7, and built a new software stack for CentOS 7. We are using OpenMPI 4.0.3, and we are using Slurm 19.05.5 as our job scheduler. We just noticed that wh

Re: [OMPI users] [External] Re: choosing network: infiniband vs. ethernet

2020-07-20 Thread Prentice Bisbal via users
Jeff, Then you'll be happy to know I've been building OpenMPI for years and I never had any complaints about your configure/build system. Of course, I'm a pro who gets paid to build open-source software all day long, but I have to say I've never had any issues with configure, make, or 'make c

Re: [OMPI users] [External] Re: Signal code: Non-existant physical address (2)

2020-07-07 Thread Prentice Bisbal via users
flags to restrict generated instructions, too, but it can be difficult to track down the precise flags that you need. On Jul 2, 2020, at 10:22 AM, Prentice Bisbal via users wrote: I manage a very heterogeneous cluster. I have nodes of different ages with different processors, different amou

[OMPI users] Signal code: Non-existant physical address (2)

2020-07-02 Thread Prentice Bisbal via users
I manage a very heterogeneous cluster. I have nodes of different ages with different processors, different amounts of RAM, etc. One user is reporting that on certain nodes, his jobs keep crashing with the errors below. His application is using OpenMPI 1.10.3, which I know is an ancient version

Re: [OMPI users] [External] Re: can't open /dev/ipath, network down (err=26)

2020-05-11 Thread Prentice Bisbal via users
Thanks. I'm going to give this solution a try. On 5/9/20 9:51 AM, Patrick Bégou via users wrote: Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit : We often get the following errors when more than one job runs on the same compute node. We are using Slurm with OpenMPI. The IB

Re: [OMPI users] [External] can't open /dev/ipath, network down (err=26)

2020-05-11 Thread Prentice Bisbal via users
I believe they're DDR cards. On 5/9/20 6:36 AM, Heinz, Michael William via users wrote: Prentice, Avoiding the obvious question of whether your FM is running and the fabric is in an active state, It sounds like your exhausting a resource on the cards. Ralph is correct about support for QLogic

[OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-08 Thread Prentice Bisbal via users
We often get the following errors when more than one job runs on the same compute node. We are using Slurm with OpenMPI. The IB cards are QLogic using PSM: 10698ipath_userinit: assign_context command failed: Network is down node01.10698can't open /dev/ipath, network down (err=26) node01.10703ip

Re: [OMPI users] [External] Re: Can't start jobs with srun.

2020-05-06 Thread Prentice Bisbal via users
i and slurm against same pmix? That is - first build pmix, than build slurm with-pmix, and than ompi with both slurm and pmix=external ? On 23/04/2020 17:00, Prentice Bisbal via users wrote: $ ompi_info | grep slurm   Configure command line: '--prefix=/usr/pppl/intel/2019-pkgs/ope

Re: [OMPI users] [External] RE: Re: Can't start jobs with srun.

2020-05-06 Thread Prentice Bisbal via users
 ompi_info  |&  grep  "MPI repo" to confirm that all nodes are running the same version of OMPI. *From:*users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Prentice Bisbal via users *Sent:* Monday, April 27, 2020 10:25 AM *To:* users@lists.open-mpi.org *Cc:* Prentice Bi

Re: [OMPI users] [External] Re: Can't start jobs with srun.

2020-04-27 Thread Prentice Bisbal via users
l Message----- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Prentice Bisbal via users Sent: Friday, April 24, 2020 2:19 PM To: Ralph Castain mailto:r...@open-mpi.org>>; Open MPI Users mailto:users@lists.open-mpi.org>> Cc: Prentice Bisbal mailto:pbis...@pppl.gov>&g

Re: [OMPI users] [External] RE: Re: Can't start jobs with srun.

2020-04-27 Thread Prentice Bisbal via users
OMPI problems? Andy -Original Message- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Prentice Bisbal via users Sent: Friday, April 24, 2020 2:19 PM To: Ralph Castain ; Open MPI Users Cc: Prentice Bisbal Subject: Re: [OMPI users] [External] Re: Can't start jobs with srun. O

Re: [OMPI users] [External] Re: Can't start jobs with srun.

2020-04-24 Thread Prentice Bisbal via users
On Apr 23, 2020, at 11:59 AM, Prentice Bisbal via users wrote: --mpi=list shows pmi2 and openmpi as valid values, but if I set --mpi= to either of them, my job still fails. Why is that? Can I not trust the output of --mpi=list? Prentice On 4/23/20 10:43 AM, Ralph Castain via users wrote

Re: [OMPI users] [External] Re: Can't start jobs with srun.

2020-04-23 Thread Prentice Bisbal via users
if that is what you are going to use. In this case, you need to configure OMPI --with-pmi2= You can leave off the path if Slurm (i.e., just "--with-pmi2") was installed in a standard location as we should find it there. On Apr 23, 2020, at 7:39 AM, Prentice Bisbal via users

Re: [OMPI users] [External] Re: Can't start jobs with srun.

2020-04-23 Thread Prentice Bisbal via users
support? Did you tell srun to use it? On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users wrote: I'm using OpenMPI 4.0.3 with Slurm 19.05.5 I'm testing the software with a very simple hello, world MPI program that I've used reliably for years. When I submit the job throug

[OMPI users] Can't start jobs with srun.

2020-04-23 Thread Prentice Bisbal via users
I'm using OpenMPI 4.0.3 with Slurm 19.05.5  I'm testing the software with a very simple hello, world MPI program that I've used reliably for years. When I submit the job through slurm and use srun to launch the job, I get these errors: *** An error occurred in MPI_Init *** on a NULL communicat

Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-10 Thread Prentice Bisbal via users
Raymond, Thanks for the info. Since we are still at CentOS 6, that is most likely the problem. Prentice On 1/8/20 8:52 PM, Raymond Muno via users wrote: AMD, list the minimum supported kernel for EPYC/NAPLES as RHEL/Centos kernel 3.10-862, which is RHEL/CentOS 7.5 or later. Upgraded kernels

Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Prentice Bisbal via users
On 1/8/20 3:30 PM, Brice Goglin via users wrote: Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit : We just added about a dozen nodes to our cluster, which have AMD EPYC 7281 processors. When a particular users jobs fall on one of these nodes, he gets these error messages

[OMPI users] AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Prentice Bisbal via users
We just added about a dozen nodes to our cluster, which have AMD EPYC 7281 processors. When a particular users jobs fall on one of these nodes, he gets these error messages: -- WARNING: a request was made to bind a process.

[OMPI users] hwloc support for Power9/IBM AC922 servers

2019-04-16 Thread Prentice Bisbal via users
OpenMPI Users, Are any of you using hwloc on Power9 hardware, specifically the IBM AC922 servers? If so, have you encountered any issues? I checked the documentation for the latest version (2.03), and found this: Since it uses standard Operating System information, hwloc's support is mostly

Re: [OMPI users] OpenMPI 2.1.0 + PGI 17.3 = asm test failures

2018-11-28 Thread Prentice Bisbal via users
Sylvain, I just ran into the same exact errors when compiling OpenMPI 3.0.0 with PGI 18.3 Prentice On 5/1/17 2:58 PM, Sylvain Jeaugey wrote: I also saw IBM and ignored the email :-) Thanks for reporting the issue, I passed it to the PGI team. On 05/01/2017 11:49 AM, Prentice Bisbal wrote: