Adding '--pmixmca ptl_tcp_if_include lo0' to the mpirun argument list seems to fix (or at least work around) the problem.
On Mon, Feb 5, 2024 at 1:49 PM John Haiducek <jhaid...@gmail.com> wrote: > Thanks, George, that issue you linked certainly looks potentially related. > > Output from ompi_info: > > Package: Open MPI brew@Monterey-arm64.local Distribution > Open MPI: 5.0.1 > Open MPI repo revision: v5.0.1 > Open MPI release date: Dec 20, 2023 > MPI API: 3.1.0 > Ident string: 5.0.1 > Prefix: /opt/homebrew/Cellar/open-mpi/5.0.1 > Configured architecture: aarch64-apple-darwin21.6.0 > Configured by: brew > Configured on: Wed Dec 20 22:18:10 UTC 2023 > Configure host: Monterey-arm64.local > Configure command line: '--disable-debug' '--disable-dependency-tracking' > '--prefix=/opt/homebrew/Cellar/open-mpi/5.0.1' > > '--libdir=/opt/homebrew/Cellar/open-mpi/5.0.1/lib' > '--disable-silent-rules' '--enable-ipv6' > '--enable-mca-no-build=reachable-netlink' > '--sysconfdir=/opt/homebrew/etc' > '--with-hwloc=/opt/homebrew/opt/hwloc' > '--with-libevent=/opt/homebrew/opt/libevent' > '--with-pmix=/opt/homebrew/opt/pmix' '--with-sge' > Built by: brew > Built on: Wed Dec 20 22:18:10 UTC 2023 > Built host: Monterey-arm64.local > C bindings: yes > Fort mpif.h: yes (single underscore) > Fort use mpi: yes (full: ignore TKR) > Fort use mpi size: deprecated-ompi-info-value > Fort use mpi_f08: yes > Fort mpi_f08 compliance: The mpi_f08 module is available, but due to > limitations in the gfortran compiler and/or Open > MPI, does not support the following: array > subsections, direct passthru (where possible) to > underlying Open MPI's C functionality > Fort mpi_f08 subarrays: no > Java bindings: no > Wrapper compiler rpath: unnecessary > C compiler: clang > C compiler absolute: clang > C compiler family name: CLANG > C compiler version: 14.0.0 (clang-1400.0.29.202) > C++ compiler: clang++ > C++ compiler absolute: clang++ > Fort compiler: gfortran > Fort compiler abs: /opt/homebrew/opt/gcc/bin/gfortran > Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::) > Fort 08 assumed shape: yes > Fort optional args: yes > Fort INTERFACE: yes > Fort ISO_FORTRAN_ENV: yes > Fort STORAGE_SIZE: yes > Fort BIND(C) (all): yes > Fort ISO_C_BINDING: yes > Fort SUBROUTINE BIND(C): yes > Fort TYPE,BIND(C): yes > Fort T,BIND(C,name="a"): yes > Fort PRIVATE: yes > Fort ABSTRACT: yes > Fort ASYNCHRONOUS: yes > Fort PROCEDURE: yes > Fort USE...ONLY: yes > Fort C_FUNLOC: yes > Fort f08 using wrappers: yes > Fort MPI_SIZEOF: yes > C profiling: yes > Fort mpif.h profiling: yes > Fort use mpi profiling: yes > Fort use mpi_f08 prof: yes > Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: > yes, > OMPI progress: no, Event lib: yes) > Sparse Groups: no > Internal debug support: no > MPI interface warnings: yes > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > dl support: yes > Heterogeneous support: no > MPI_WTIME support: native > Symbol vis. support: yes > Host topology support: yes > IPv6 support: yes > MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat > Fault Tolerance support: yes > FT MPI support: yes > MPI_MAX_PROCESSOR_NAME: 256 > MPI_MAX_ERROR_STRING: 256 > MPI_MAX_OBJECT_NAME: 64 > MPI_MAX_INFO_KEY: 36 > MPI_MAX_INFO_VAL: 256 > MPI_MAX_PORT_NAME: 1024 > MPI_MAX_DATAREP_STRING: 128 > MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.1) > MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.1) > MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.1) > MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.1) > MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.0.1) > MCA if: bsdx_ipv6 (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component > v5.0.1) > MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component > v5.0.1) > MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.0.1) > MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component > v5.0.1) > MCA timer: darwin (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.1) > MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component > v5.0.1) > MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component > v5.0.1) > MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.1) > MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component > v5.0.1) > MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.1) > MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component > v5.0.1) > MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.1) > MCA part: persist (MCA v2.1.0, API v4.0.0, Component > v5.0.1) > MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.1) > MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component > v5.0.1) > MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.1) > MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.1) > MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.1) > MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.0.1) > MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component > v5.0.1) > MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component > v5.0.1) > > On Mon, Feb 5, 2024 at 12:48 PM George Bosilca <bosi...@icl.utk.edu> > wrote: > >> OMPI seems unable to create a communication medium between your >> processes. There are few known issues on OSX, please read >> https://github.com/open-mpi/ompi/issues/12273 for more info. >> >> Can you provide the header of the ompi_info command. What I'm interested >> on is the part about `Configure command line:` >> >> George. >> >> >> On Mon, Feb 5, 2024 at 12:18 PM John Haiducek via users < >> users@lists.open-mpi.org> wrote: >> >>> I'm having problems running programs compiled against the OpenMPI 5.0.1 >>> package provided by homebrew on MacOS (arm) 12.6.1. >>> >>> When running a Fortran test program that simply calls MPI_init followed >>> by MPI_Finalize, I get the following output: >>> >>> $ mpirun -n 2 ./mpi_init_test >>> >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or >>> environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> PML add procs failed >>> --> Returned "Not found" (-13) instead of "Success" (0) >>> >>> -------------------------------------------------------------------------- >>> >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or >>> environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_mpi_init: ompi_mpi_instance_init failed >>> --> Returned "Not found" (-13) instead of "Success" (0) >>> >>> -------------------------------------------------------------------------- >>> [haiducek-lt:00000] *** An error occurred in MPI_Init >>> [haiducek-lt:00000] *** reported by process [1905590273,1] >>> [haiducek-lt:00000] *** on a NULL communicator >>> [haiducek-lt:00000] *** Unknown error >>> [haiducek-lt:00000] *** MPI_ERRORS_ARE_FATAL (processes in this >>> communicator will now abort, >>> [haiducek-lt:00000] *** and MPI will try to terminate your MPI job as >>> well) >>> >>> -------------------------------------------------------------------------- >>> prterun detected that one or more processes exited with non-zero status, >>> thus causing the job to be terminated. The first process to do so was: >>> >>> Process name: [prterun-haiducek-lt-15584@1,1] Exit code: 14 >>> >>> -------------------------------------------------------------------------- >>> >>> I'm not sure whether this is the result of a bug in OpenMPI, in the >>> homebrew package, or a misconfiguration of my system. Any suggestions for >>> troubleshooting this? >>> >>