Eduardo, You have two options to use OmniPath
- “directly” via the psm2 mtl mpirun —mca pml cm —mca mtl psm2 ... - “indirectly” via libfabric mpirun —mca pml cm —mca mtl ofi ... I do invite you to try both. By explicitly requesting the mtl you will avoid potential conflicts. libfabric is used in production by Cisco and AWS (both major contributors to both Open MPI and libfabric) so this is clearly not something to stay away from. That being said, bug always happen and they could be related to Open MPI, libfabric and/or OmniPath (and fwiw, Intel is a major contributor to libfabric too) Cheers, Gilles On Thursday, January 10, 2019, ROTHE Eduardo - externe < eduardo-externe.ro...@edf.fr> wrote: > Hi Gilles, thank you so much for your support! > > For now I'm just testing the software, so it's running on a single node. > > Your suggestion was very precise. In fact, choosing the ob1 component > leads to a successfull execution! The tcp component had no effect. > > mpirun --mca pml ob1 —mca btl tcp,self -np 2 ./a.out *> Success* > mpirun --mca pml ob1 -np 2 ./a.out *> Success* > > But... our cluster is equiped with Intel OMNI Path interconnects and we > are aiming to use psm2 through ofi component in order to take full > advantage of this technology. > > I believe your suggestion is showing that the problem is right here. But > unfortunately I cannot see further. > > Meanwhile, I've also compiled Open MPI 3.1.3 and I have a successfull run > with the same options and the same environment (no MPI_Send error). Could > Open MPI 4.0.0 bring a different behaviour in this area? Eventually > regarding ofi component? > > Do you have any idea that I could put in practice to narrow the problem > further? > > Regards, > Eduardo > > ps: I've recompiled Open MPI 4.0.0 using --with-hwloc=external, but with > no different results (the same MPI_Send error); > > ps2: Yes, the configure line thing is really fishy, the original line was > --prefix=/opt/openmpi/4.0.0 > --with-pmix=/usr/lib/x86_64-linux-gnu/pmix --with-libevent=external > --with-slurm --enable-mpi-cxx --with-ofi --with-verbs=no > --disable-silent-rules --with-hwloc=/usr --enable-mpirun-prefix-by-default > --with-devel-headers > > > ------------------------------ > *De :* users <users-boun...@lists.open-mpi.org> de la part de > gilles.gouaillar...@gmail.com <gilles.gouaillar...@gmail.com> > *Envoyé :* mercredi 9 janvier 2019 15:16 > *À :* Open MPI Users > *Objet :* Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send > > Eduardo, > > The first part of the configure command line is for an install in /usr, > but then there is ‘—prefix=/opt/openmpi/4.0.0’ and this is very fishy. > You should also use ‘—with-hwloc=external’. > > How many nodes are you running on and which interconnect are you using ? > What if you > mpirun —mca pml ob1 —mca btl tcp,self -np 2 ./a.out > > Cheers, > > Gilles > > On Wednesday, January 9, 2019, ROTHE Eduardo - externe < > eduardo-externe.ro...@edf.fr> wrote: > >> Hi. >> >> I'm testing Open MPI 4.0.0 and I'm struggling with a weird behaviour. In >> a very simple example (very frustrating). I'm having the following error >> returned by MPI_Send: >> >> >> >> >> >> >> * [gafront4:25692] *** An error occurred in MPI_Send >> [gafront4:25692] *** reported by process [3152019457,0] >> [gafront4:25692] *** on communicator MPI_COMM_WORLD >> [gafront4:25692] *** MPI_ERR_OTHER: known error not in list >> [gafront4:25692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator >> will now abort, [gafront4:25692] *** and potentially your MPI >> job)* >> >> In the same machine I have other two instalations of Open MPI (2.0.2 and >> 2.1.2) and they all run successfully this dummy program: >> >> #include <stdio.h> >> #include <mpi.h> >> >> int main(int argc, char **argv) { >> int process; >> int population; >> >> MPI_Init(NULL, NULL); >> MPI_Comm_rank(MPI_COMM_WORLD, &process); >> MPI_Comm_size(MPI_COMM_WORLD, &population); >> printf("Hello World from proccess %d out of %d\n", process, >> population); >> >> int send_number = 10; >> int recv_number; >> >> if (process == 0) { >> MPI_Send(&send_number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD); >> printf("This is process 0 reporting::\n"); >> } else if (process == 1) { >> MPI_Recv(&recv_number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, >> MPI_STATUS_IGNORE); >> printf("Process 1 received number %d from process 0\n", >> recv_number); >> } >> >> MPI_Finalize(); >> return 0; >> } >> >> I'm really upset about recurring to you with this problem. I've been >> arround it for days now and can't find any good solution. Can you please >> take a look? I've enabled *FI_LOG_LEVEL=Debug* to see if I can trap any >> information that could be of use but unfortunetly with no success. I've >> also googled a lot, but I don't see where this error message might be >> pointing at. Specially having two other working versions on the same >> machine. The thing is that I see no reason why this code shouldn't run. >> >> The following is the configure command line, as given by ompi_info. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> * Configure command line: '--build=x86_64-linux-gnu' >> '--prefix=/usr' >> '--includedir=${prefix}/include' >> '--mandir=${prefix}/share/man' >> '--infodir=${prefix}/share/info' >> '--sysconfdir=/etc' '--localstatedir=/var' >> '--disable-silent-rules' >> '--libdir=${prefix}/lib/x86_64-linux-gnu' >> '--libexecdir=${prefix}/lib/x86_64-linux-gnu' >> '--disable-maintainer-mode' >> '--disable-dependency-tracking' >> '--prefix=/opt/openmpi/4.0.0' >> '--with-pmix=/usr/lib/x86_64-linux-gnu/pmix' >> '--with-libevent=external' '--with-slurm' >> '--enable-mpi-cxx' '--with-ofi' '--with-verbs=no' >> '--disable-silent-rules' '--with-hwloc=/usr' >> '--enable-mpirun-prefix-by-default' >> '--with-devel-headers'* >> >> Thank you for your time. >> Regards, >> Ed >> >> >> >> Ce message et toutes les pièces jointes (ci-après le 'Message') sont >> établis à l'intention exclusive des destinataires et les informations qui y >> figurent sont strictement confidentielles. Toute utilisation de ce Message >> non conforme à sa destination, toute diffusion ou toute publication totale >> ou partielle, est interdite sauf autorisation expresse. >> >> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de >> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou >> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de >> votre système, ainsi que toutes ses copies, et de n'en garder aucune trace >> sur quelque support que ce soit. Nous vous remercions également d'en >> avertir immédiatement l'expéditeur par retour du message. >> >> Il est impossible de garantir que les communications par messagerie >> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute >> erreur ou virus. >> ____________________________________________________ >> >> This message and any attachments (the 'Message') are intended solely for >> the addressees. The information contained in this Message is confidential. >> Any use of information contained in this Message not in accord with its >> purpose, any dissemination or disclosure, either whole or partial, is >> prohibited except formal approval. >> >> If you are not the addressee, you may not copy, forward, disclose or use >> any part of it. If you have received this message in error, please delete >> it and all copies from your system and notify the sender immediately by >> return message. >> >> E-mail communication cannot be guaranteed to be timely secure, error or >> virus-free. >> > > Ce message et toutes les pièces jointes (ci-après le 'Message') sont > établis à l'intention exclusive des destinataires et les informations qui y > figurent sont strictement confidentielles. Toute utilisation de ce Message > non conforme à sa destination, toute diffusion ou toute publication totale > ou partielle, est interdite sauf autorisation expresse. > > Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de > le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou > partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de > votre système, ainsi que toutes ses copies, et de n'en garder aucune trace > sur quelque support que ce soit. Nous vous remercions également d'en > avertir immédiatement l'expéditeur par retour du message. > > Il est impossible de garantir que les communications par messagerie > électronique arrivent en temps utile, sont sécurisées ou dénuées de toute > erreur ou virus. > ____________________________________________________ > > This message and any attachments (the 'Message') are intended solely for > the addressees. The information contained in this Message is confidential. > Any use of information contained in this Message not in accord with its > purpose, any dissemination or disclosure, either whole or partial, is > prohibited except formal approval. > > If you are not the addressee, you may not copy, forward, disclose or use > any part of it. If you have received this message in error, please delete > it and all copies from your system and notify the sender immediately by > return message. > > E-mail communication cannot be guaranteed to be timely secure, error or > virus-free. >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users