did you happen to get 4.7.1 which comes with ucx-1.7.0-1.47100
compiled again openmpi 4.0.2?
i got snagged by this
https://github.com/open-mpi/ompi/issues/7128
which i thought would have had the fixes merged into the v4.0.2 tag,
but it doesn't seem so in my case
On Fri, Feb 7, 2020 at 11:34 AM
Hi,
Sorry to raise this issue again but now I still receive the following error
after a while:
posix: file name search - max attempts exceeded.cannot continue with posix.
I am compiling and running with the following command:
*make && mpiexec --oversubscribe -np 10 main.out*
My make file is
Were using MLNX_OFED 4.7.3. It supplies UCX 1.7.0.
We have OpenMPI 4.02 compiled against the Mellanox OFED 4.7.3 provided versions of UCX, KNEM and
HCOLL, along with HWLOC 2.1.0 from the OpenMPI site.
I mirrored the build to be what Mellanox used to configure OpenMPI in HPC-X 2.5.
I have user
i haven't compiled openmpi in a while, but i'm in the process of
upgrading our cluster.
the last time i did this there were specific versions of mpi/pmix/ucx
that were all tested and supposed to work together. my understanding
of this was because pmi/ucx was under rapid development and the api's
Today I came across the two MCA parameters osc_ucx_progress_iterations
and pml_ucx_progress_iterations in Open MPI. My interpretation of the
description is that in a loop such as below, progress in UCX is only
triggered every 100 iterations (assuming opal_progress is only called
once per MPI_Te
Ok it seems to be working now if I remove the quotation marks from the
config file. Thank you all for your help! I am looking forward to using
open MPI for my work!
On Fri, 7 Feb 2020 at 09:42, Jin Tao wrote:
> Hi,
>
> Thank you for the guidance. I rebooted my computer but now the program
> fail
Hi,
Thank you for the guidance. I rebooted my computer but now the program
fails to compile.
I then tried changing the tmp directory adding the following line to
*openmpi-mca-params.conf*:
*orte_tmpdir_base = "/Users/myname/Desktop/shared/tmp"*
But now I get the following error:
*PMIX ERROR: E