[OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users
Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and d

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Gilles Gouaillardet via users
Patrick, I am not sure Open MPI can do that out of the box. Maybe hacking pmix_net_get_hostname() in opal/mca/pmix/pmix3x/pmix/src/util/net.c can do the trick. Cheers, Gilles On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users < users@lists.open-mpi.org> wrote: > Hi all, > > we are faci

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Gilles Gouaillardet via users
Patrick, you will likely also need to apply the same hack to opal_net_get_hostname() in opal/util/net.c Cheers, Gilles On Thu, Jun 16, 2022 at 7:30 PM Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Patrick, > > I am not sure Open MPI can do that out of the box. > > Maybe hackin

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Jeff Squyres (jsquyres) via users
What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Patrick Begou via users Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the no

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users
Hi  Gilles and Jeff, @Gilles I will have a look at these files, thanks. @Jeff this is the error message (screen dump attached) and of course the nodes names do not agree with the standard. Patrick Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : What exactly is the error that is o

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread George Bosilca via users
This error seems to be initiated from the PMIX regex framework. Not sure exactly which one is used, but a good starting point is in one of the files in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex function in the different components, one of them is raising the error. George.

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Jeff Squyres (jsquyres) via users
Ah; this is a slightly different error than what Gilles was guessing from your prior description. This is what you're running in to: https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134 Try running with: mpirun --mca regex naive ... Specifically: the "fwd" regex