Hi all,
we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.
So OpenMPI complains about this and d
Patrick,
I am not sure Open MPI can do that out of the box.
Maybe hacking pmix_net_get_hostname() in
opal/mca/pmix/pmix3x/pmix/src/util/net.c
can do the trick.
Cheers,
Gilles
On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users <
users@lists.open-mpi.org> wrote:
> Hi all,
>
> we are faci
Patrick,
you will likely also need to apply the same hack to opal_net_get_hostname()
in opal/util/net.c
Cheers,
Gilles
On Thu, Jun 16, 2022 at 7:30 PM Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:
> Patrick,
>
> I am not sure Open MPI can do that out of the box.
>
> Maybe hackin
What exactly is the error that is occurring?
--
Jeff Squyres
jsquy...@cisco.com
From: users on behalf of Patrick Begou via
users
Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the no
Hi Gilles and Jeff,
@Gilles I will have a look at these files, thanks.
@Jeff this is the error message (screen dump attached) and of course the
nodes names do not agree with the standard.
Patrick
Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :
What exactly is the error that is o
This error seems to be initiated from the PMIX regex framework. Not sure
exactly which one is used, but a good starting point is in one of the files
in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex
function in the different components, one of them is raising the error.
George.
Ah; this is a slightly different error than what Gilles was guessing from your
prior description. This is what you're running in to:
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134
Try running with:
mpirun --mca regex naive ...
Specifically: the "fwd" regex