HI All,
I opened a new issue to track the coll_perf failure in case its not related
to the HDF5 problem reported earlier.
https://github.com/open-mpi/ompi/issues/8246
Howard
Am Mo., 23. Nov. 2020 um 12:14 Uhr schrieb Dave Love via users <
users@lists.open-mpi.org>:
> Mark Dixon via users wri
HI Martin,
Thanks this is helpful. Are you getting this timeout when you're running
the spawner process as a singleton?
Howard
Am Fr., 14. Aug. 2020 um 17:44 Uhr schrieb MartÃn Morales <
martineduardomora...@hotmail.com>:
> Howard,
>
>
>
> I pasted below, the error message after a while of the
Hi Martin,
I opened an issue on Open MPI's github to track this
https://github.com/open-mpi/ompi/issues/8005
You may be seeing another problem if you removed master from the host file.
Could you add the --debug-daemons option to the mpirun and post the output?
Howard
Am Di., 11. Aug. 2020 um 1
Hi Ralph,
I've not yet determined whether this is actually a PMIx issue or the way
the dpm stuff in OMPI is handling PMIx namespaces.
Howard
Am Di., 11. Aug. 2020 um 19:34 Uhr schrieb Ralph Castain via users <
users@lists.open-mpi.org>:
> Howard - if there is a problem in PMIx that is causing
Hi Martin,
I was able to reproduce this with 4.0.x branch. I'll open an issue.
If you really want to use 4.0.4, then what you'll need to do is build an
external PMIx 3.1.2 (the PMIx that was embedded in Open MPI 4.0.1), and
then build Open MPI using the --with-pmix=where your pmix is installed
Y
Hello Martin,
Between Open MPI 4.0.1 and Open MPI 4.0.4 we upgraded the internal PMIx
version that introduced a problem with spawn for the 4.0.2-4.0.4 versions.
This is supposed to be fixed in the 4.0.5 release. Could you try the
4.0.5rc1 tarball and see if that addresses the problem you're seein
Hello Michael,
Not sure what could be causing this in terms of delta between v4.0.3 and
v4.0.4.
Two things to try
- add --debug-daemons and --mca pmix_base_verbose 100 to the mpirun line
and compare output from the v4.0.3 and v4.0.4 installs
- perhaps try using the --enable-mpirun-prefix-by-defau
Collin,
A couple of things to try. First, could you just configure without using
the mellanox platform file and see if you can run the app with 100 or more
processes?
Another thing to try is to keep using the mellanox platform file, but run
the app with
mpirun --mca pml ob1 -np 100 bin/xhpcg
an
Hello Collen,
Could you provide more information about the error. Is there any output
from either Open MPI or, maybe, UCX, that could provide more information
about the problem you are hitting?
Howard
Am Mo., 27. Jan. 2020 um 08:38 Uhr schrieb Collin Strassburger via users <
users@lists.open-m