[OMPI users] How do I compile OpenMPI in Xcode 3.1
Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in Xcode", but it's only for MPICC. I am using MPIF90, so I did the same, but changing MPICC for MPIF90, and also the path, but it did not work. Building target “fortran” of project “fortran” with configuration “Debug” Checking Dependencies Invalid value 'MPIF90' for GCC_VERSION The file "MPIF90.cpcompspec" looks like this: 1 /** 2 Xcode Coompiler Specification for MPIF90 3 4 */ 5 6 { Type = Compiler; 7 Identifier = com.apple.compilers.mpif90; 8 BasedOn = com.apple.compilers.gcc.4_0; 9 Name = "MPIF90"; 10 Version = "Default"; 11 Description = "MPI GNU C/C++ Compiler 4.0"; 12 ExecPath = "/usr/local/bin/mpif90"; // This gets converted to the g++ variant automatically 13 PrecompStyle = pch; 14 } and is located in "/Developer/Library/Xcode/Plug-ins" and when I do mpif90 -v on terminal it works well: Using built-in specs. Target: i386-apple-darwin8.10.1 Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure -- prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/ tmp/gfortran-20090321/gfortran_libs --enable-bootstrap Thread model: posix gcc version 4.4.0 20090321 (experimental) [trunk revision 144983] (GCC) Any idea?? Thanks. Vincent
Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
Yes, I already have gfortran compiler on /usr/local/bin, the same path as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin and on /Developer/usr/bin says it: "Unfortunately, this installation of Open MPI was not compiled with Fortran 90 support. As such, the mpif90 compiler is non-functional." That should be the problem, I will have to change the path to use the gfortran I have installed. How could I do it? (Sorry, I am beginner) Thanks. El 04/05/2009, a las 17:38, Warner Yuen escribió: Have you installed a Fortran compiler? Mac OS X's developer tools do not come with a Fortran compiler, so you'll need to install one if you haven't already done so. I routinely use the Intel IFORT compilers with success. However, I hear many good things about the gfortran compilers on Mac OS X, you can't beat the price of gfortran! Warner Yuen Scientific Computing Consulting Engineer Apple, Inc. email: wy...@apple.com Tel: 408.718.2859 On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote: Send users mailing list submissions to us...@open-mpi.org To subscribe or unsubscribe via the World Wide Web, visit http://www.open-mpi.org/mailman/listinfo.cgi/users or, via email, send a message with subject or body 'help' to users-requ...@open-mpi.org You can reach the person managing the list at users-ow...@open-mpi.org When replying, please edit your Subject line so it is more specific than "Re: Contents of users digest..." Today's Topics: 1. How do I compile OpenMPI in Xcode 3.1 (Vicente) 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain) -- Message: 1 Date: Mon, 4 May 2009 16:12:44 +0200 From: Vicente Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1 To: us...@open-mpi.org Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com> Content-Type: text/plain; charset="windows-1252"; Format="flowed"; DelSp="yes" Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in Xcode", but it's only for MPICC. I am using MPIF90, so I did the same, but changing MPICC for MPIF90, and also the path, but it did not work. Building target ?fortran? of project ?fortran? with configuration ?Debug? Checking Dependencies Invalid value 'MPIF90' for GCC_VERSION The file "MPIF90.cpcompspec" looks like this: 1 /** 2 Xcode Coompiler Specification for MPIF90 3 4 */ 5 6 { Type = Compiler; 7 Identifier = com.apple.compilers.mpif90; 8 BasedOn = com.apple.compilers.gcc.4_0; 9 Name = "MPIF90"; 10 Version = "Default"; 11 Description = "MPI GNU C/C++ Compiler 4.0"; 12 ExecPath = "/usr/local/bin/mpif90"; // This gets converted to the g++ variant automatically 13 PrecompStyle = pch; 14 } and is located in "/Developer/Library/Xcode/Plug-ins" and when I do mpif90 -v on terminal it works well: Using built-in specs. Target: i386-apple-darwin8.10.1 Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure -- prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/ tmp/gfortran-20090321/gfortran_libs --enable-bootstrap Thread model: posix gcc version 4.4.0 20090321 (experimental) [trunk revision 144983] (GCC) Any idea?? Thanks. Vincent -- next part -- HTML attachment scrubbed and removed -- Message: 2 Date: Mon, 4 May 2009 08:28:26 -0600 From: Ralph Castain Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? To: Open MPI Users Message-ID: <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Unfortunately, I didn't write any of that code - I was just fixing the mapper so it would properly map the procs. From what I can tell, the proper things are happening there. I'll have to dig into the code that specifically deals with parsing the results to bind the processes. Afraid that will take awhile longer - pretty dark in that hole. On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot wrote: Hi, So, there are no more crashes with my "crazy" mpirun command. But the paffinity feature seems to be broken. Indeed I am not able to pin my processes. Simple test with a program using your plpa library : r011n006% cat hostf r011n006 slots=4 r011n006% cat rankf rank 0=r011n006 slot=0 > bind to CPU 0 , exact ? r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf -- rankfile rankf --wdir /tmp -n 1 a.out PLPA Number of processors online: 4 PLPA Number of processor sockets: 2 PLPA Socket 0 (ID 0): 2 cores PLPA Socket 1 (ID 3): 2 cores Ctrl+Z r011n006%bg r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot R+ g
Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
I can use openmpi from terminal, but I am having problems with gdb, so I wanted to know if there was posible to use openmpi with Xcode. However, for mac users, which is the best way to compile and debug an mpi program??. Thanks. Vincent El 04/05/2009, a las 17:42, Jeff Squyres escribió: FWIW, I don't use Xcode, but I use the precompiled gcc/gfortran from here with good success: http://hpc.sourceforge.net/ On May 4, 2009, at 11:38 AM, Warner Yuen wrote: Have you installed a Fortran compiler? Mac OS X's developer tools do not come with a Fortran compiler, so you'll need to install one if you haven't already done so. I routinely use the Intel IFORT compilers with success. However, I hear many good things about the gfortran compilers on Mac OS X, you can't beat the price of gfortran! Warner Yuen Scientific Computing Consulting Engineer Apple, Inc. email: wy...@apple.com Tel: 408.718.2859 On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. How do I compile OpenMPI in Xcode 3.1 (Vicente) > 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain) > > > ------ > > Message: 1 > Date: Mon, 4 May 2009 16:12:44 +0200 > From: Vicente > Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1 > To: us...@open-mpi.org > Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com> > Content-Type: text/plain; charset="windows-1252"; Format="flowed"; > DelSp="yes" > > Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in > Xcode", but it's only for MPICC. I am using MPIF90, so I did the same, > but changing MPICC for MPIF90, and also the path, but it did not work. > > Building target ?fortran? of project ?fortran? with configuration > ?Debug? > > > Checking Dependencies > Invalid value 'MPIF90' for GCC_VERSION > > > The file "MPIF90.cpcompspec" looks like this: > > 1 /** > 2 Xcode Coompiler Specification for MPIF90 > 3 > 4 */ > 5 > 6 { Type = Compiler; > 7 Identifier = com.apple.compilers.mpif90; > 8 BasedOn = com.apple.compilers.gcc.4_0; > 9 Name = "MPIF90"; > 10 Version = "Default"; > 11 Description = "MPI GNU C/C++ Compiler 4.0"; > 12 ExecPath = "/usr/local/bin/mpif90"; // This gets > converted to the g++ variant automatically > 13 PrecompStyle = pch; > 14 } > > and is located in "/Developer/Library/Xcode/Plug-ins" > > and when I do mpif90 -v on terminal it works well: > > Using built-in specs. > Target: i386-apple-darwin8.10.1 > Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure -- > prefix=/usr/local/gfortran --enable-languages=c,fortran --with- gmp=/ > tmp/gfortran-20090321/gfortran_libs --enable-bootstrap > Thread model: posix > gcc version 4.4.0 20090321 (experimental) [trunk revision 144983] > (GCC) > > > Any idea?? > > Thanks. > > Vincent > -- next part -- > HTML attachment scrubbed and removed > > -- > > Message: 2 > Date: Mon, 4 May 2009 08:28:26 -0600 > From: Ralph Castain > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > To: Open MPI Users > Message-ID: > <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Unfortunately, I didn't write any of that code - I was just fixing the > mapper so it would properly map the procs. From what I can tell, the > proper > things are happening there. > > I'll have to dig into the code that specifically deals with parsing > the > results to bind the processes. Afraid that will take awhile longer - > pretty > dark in that hole. > > > On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot > wrote: > >> Hi, >> >> So, there are no more crashes with my "crazy" mpirun command. But the >> paffinity feature seems to be broken. Indeed I am not able to pin my >>
Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
If I can not make it work with Xcode, which one could I use?, which one do you use to compile and debug OpenMPI?. Thanks Vincent 2009/5/4 Jeff Squyres > Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard > doesn't ship with a Fortran compiler, the Open MPI that Apple ships has > non-functional mpif77 and mpif90 wrapper compilers. > > So the Open MPI that you installed manually will use your Fortran > compilers, and therefore will have functional mpif77 and mpif90 wrapper > compilers. Hence, you probably need to be sure to use the "right" wrapper > compilers. It looks like you specified the full path specified to ExecPath, > so I'm not sure why Xcode wouldn't work with that (like I mentioned, I > unfortunately don't use Xcode myself, so I don't know why that wouldn't > work). > > > > > On May 4, 2009, at 11:53 AM, Vicente wrote: > > Yes, I already have gfortran compiler on /usr/local/bin, the same path >> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin >> and on /Developer/usr/bin says it: >> >> "Unfortunately, this installation of Open MPI was not compiled with >> Fortran 90 support. As such, the mpif90 compiler is non-functional." >> >> >> That should be the problem, I will have to change the path to use the >> gfortran I have installed. >> How could I do it? (Sorry, I am beginner) >> >> Thanks. >> >> >> El 04/05/2009, a las 17:38, Warner Yuen escribió: >> >> > Have you installed a Fortran compiler? Mac OS X's developer tools do >> > not come with a Fortran compiler, so you'll need to install one if >> > you haven't already done so. I routinely use the Intel IFORT >> > compilers with success. However, I hear many good things about the >> > gfortran compilers on Mac OS X, you can't beat the price of gfortran! >> > >> > >> > Warner Yuen >> > Scientific Computing >> > Consulting Engineer >> > Apple, Inc. >> > email: wy...@apple.com >> > Tel: 408.718.2859 >> > >> > >> > >> > >> > On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote: >> > >> >> Send users mailing list submissions to >> >> us...@open-mpi.org >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> or, via email, send a message with subject or body 'help' to >> >> users-requ...@open-mpi.org >> >> >> >> You can reach the person managing the list at >> >> users-ow...@open-mpi.org >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. How do I compile OpenMPI in Xcode 3.1 (Vicente) >> >> 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain) >> >> >> >> >> >> -- >> >> >> >> Message: 1 >> >> Date: Mon, 4 May 2009 16:12:44 +0200 >> >> From: Vicente >> >> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >> >> To: us...@open-mpi.org >> >> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com> >> >> Content-Type: text/plain; charset="windows-1252"; Format="flowed"; >> >> DelSp="yes" >> >> >> >> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in >> >> Xcode", but it's only for MPICC. I am using MPIF90, so I did the >> >> same, >> >> but changing MPICC for MPIF90, and also the path, but it did not >> >> work. >> >> >> >> Building target ?fortran? of project ?fortran? with configuration >> >> ?Debug? >> >> >> >> >> >> Checking Dependencies >> >> Invalid value 'MPIF90' for GCC_VERSION >> >> >> >> >> >> The file "MPIF90.cpcompspec" looks like this: >> >> >> >> 1 /** >> >> 2 Xcode Coompiler Specification for MPIF90 >> >> 3 >> >> 4 */ >> >> 5 >> >> 6 { Type = Compiler; >> >> 7 Identifier = com.apple.compilers.mpif90; >&
Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
I can run openmpi perfectly with command line, but I wanted a graphic interface for debugging because I was having problems. Thanks anyway. Vincent 2009/5/4 Warner Yuen > Admittedly, I don't use Xcode to build Open MPI either. > > You can just compile Open MPI from the command line and install everything > in /usr/local/. Make sure that gfortran is set in your path and you should > just be able to do a './configure --prefix=/usr/local' > > After the installation, just make sure that your path is set correctly when > you go to use the newly installed Open MPI. If you don't set your path, it > will always default to using the version of OpenMPI that ships with Leopard. > > > Warner Yuen > Scientific Computing > Consulting Engineer > Apple, Inc. > email: wy...@apple.com > Tel: 408.718.2859 > > > > > On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote: > > Send users mailing list submissions to >>us...@open-mpi.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >>http://www.open-mpi.org/mailman/listinfo.cgi/users >> or, via email, send a message with subject or body 'help' to >>users-requ...@open-mpi.org >> >> You can reach the person managing the list at >>users-ow...@open-mpi.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of users digest..." >> >> >> Today's Topics: >> >> 1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig) >> >> >> -- >> >> Message: 1 >> Date: Mon, 4 May 2009 18:13:45 +0200 >> From: Vicente Puig >> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >> To: Open MPI Users >> Message-ID: >><3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com> >> Content-Type: text/plain; charset="iso-8859-1" >> >> If I can not make it work with Xcode, which one could I use?, which one >> do >> you use to compile and debug OpenMPI?. >> Thanks >> >> Vincent >> >> >> 2009/5/4 Jeff Squyres >> >> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard >>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has >>> non-functional mpif77 and mpif90 wrapper compilers. >>> >>> So the Open MPI that you installed manually will use your Fortran >>> compilers, and therefore will have functional mpif77 and mpif90 wrapper >>> compilers. Hence, you probably need to be sure to use the "right" >>> wrapper >>> compilers. It looks like you specified the full path specified to >>> ExecPath, >>> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I >>> unfortunately don't use Xcode myself, so I don't know why that wouldn't >>> work). >>> >>> >>> >>> >>> On May 4, 2009, at 11:53 AM, Vicente wrote: >>> >>> Yes, I already have gfortran compiler on /usr/local/bin, the same path >>> >>>> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin >>>> and on /Developer/usr/bin says it: >>>> >>>> "Unfortunately, this installation of Open MPI was not compiled with >>>> Fortran 90 support. As such, the mpif90 compiler is non-functional." >>>> >>>> >>>> That should be the problem, I will have to change the path to use the >>>> gfortran I have installed. >>>> How could I do it? (Sorry, I am beginner) >>>> >>>> Thanks. >>>> >>>> >>>> El 04/05/2009, a las 17:38, Warner Yuen escribi?: >>>> >>>> Have you installed a Fortran compiler? Mac OS X's developer tools do >>>>> not come with a Fortran compiler, so you'll need to install one if >>>>> you haven't already done so. I routinely use the Intel IFORT >>>>> compilers with success. However, I hear many good things about the >>>>> gfortran compilers on Mac OS X, you can't beat the price of gfortran! >>>>> >>>>> >>>>> Warner Yuen >>>>> Scientific Computing >>>>> Consulting Engineer >>>>> Apple, Inc. >>>>> email: wy...@apple.com >>>>> Tel: 408.718.2859 >>>>> >>>>> >>>>&
Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
Maybe I had to open a new thread, but if you have any idea why I receive it when I use gdb for debugging an openmpi program: warning: Could not find object file "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug information available for "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". warning: Could not find object file "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no debug information available for "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". warning: Could not find object file "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no debug information available for "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". warning: Could not find object file "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug information available for "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c". warning: Could not find object file "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o" - no debug information available for "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c". warning: Could not find object file "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug information available for "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c". ... There is no 'admin' so I don't know why it happen. It works well with a C program. Any idea??. Thanks. Vincent 2009/5/4 Vicente Puig > I can run openmpi perfectly with command line, but I wanted a graphic > interface for debugging because I was having problems. > Thanks anyway. > > Vincent > > 2009/5/4 Warner Yuen > > Admittedly, I don't use Xcode to build Open MPI either. >> >> You can just compile Open MPI from the command line and install everything >> in /usr/local/. Make sure that gfortran is set in your path and you should >> just be able to do a './configure --prefix=/usr/local' >> >> After the installation, just make sure that your path is set correctly >> when you go to use the newly installed Open MPI. If you don't set your path, >> it will always default to using the version of OpenMPI that ships with >> Leopard. >> >> >> Warner Yuen >> Scientific Computing >> Consulting Engineer >> Apple, Inc. >> email: wy...@apple.com >> Tel: 408.718.2859 >> >> >> >> >> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote: >> >> Send users mailing list submissions to >>>us...@open-mpi.org >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>>http://www.open-mpi.org/mailman/listinfo.cgi/users >>> or, via email, send a message with subject or body 'help' to >>>users-requ...@open-mpi.org >>> >>> You can reach the person managing the list at >>>users-ow...@open-mpi.org >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of users digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig) >>> >>> >>> -- >>> >>> Message: 1 >>> Date: Mon, 4 May 2009 18:13:45 +0200 >>> From: Vicente Puig >>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >>> To: Open MPI Users >>> Message-ID: >>><3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> If I can not make it work with Xcode, which one could I use?, which one >>> do >>> you use to compile and debug OpenMPI?. >>> Thanks >>> >>> Vincent >>> >>> >>> 2009/5/4 Jeff Squyres >>> >>> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard >>>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has >>>> non-functional mpif77 and mpif90 wrapper compilers. >>>> >>>> So the Open MPI that you installed manually will use your Fortran >>>> compilers, and therefore will have functional mpif77 and mpif90 wrapper >>>> compilers. Hence, you probably need to be sure to use the "right" >>>> wrapper >>>> compilers. It looks like you specified the full path specified to >>>> ExecPath, >>>&g
Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
But it doesn't work well. For example, I am trying to debug a program, "floyd" in this case, and when I make a breakpoint: No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c". I am getting disappointed and frustrated that I can not work well with openmpi in my Mac. There should be a was to make it run in Xcode, uff... 2009/5/4 Jeff Squyres > I get those as well. I believe that they are (annoying but) harmless -- an > artifact of how the freeware gcc/gofrtran that I use was built. > > > > On May 4, 2009, at 1:47 PM, Vicente Puig wrote: > > Maybe I had to open a new thread, but if you have any idea why I receive >> it when I use gdb for debugging an openmpi program: >> >> warning: Could not find object file >> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug >> information available for >> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >> >> >> warning: Could not find object file >> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no >> debug information available for >> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >> >> >> warning: Could not find object file >> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no >> debug information available for >> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >> >> >> warning: Could not find object file >> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug >> information available for >> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c". >> >> >> warning: Could not find object file >> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o" >> - no debug information available for >> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c". >> >> >> warning: Could not find object file >> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug >> information available for >> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c". >> ... >> >> >> >> There is no 'admin' so I don't know why it happen. It works well with a C >> program. >> >> Any idea??. >> >> Thanks. >> >> >> Vincent >> >> >> >> >> >> 2009/5/4 Vicente Puig >> I can run openmpi perfectly with command line, but I wanted a graphic >> interface for debugging because I was having problems. >> >> Thanks anyway. >> >> Vincent >> >> 2009/5/4 Warner Yuen >> >> Admittedly, I don't use Xcode to build Open MPI either. >> >> You can just compile Open MPI from the command line and install everything >> in /usr/local/. Make sure that gfortran is set in your path and you should >> just be able to do a './configure --prefix=/usr/local' >> >> After the installation, just make sure that your path is set correctly >> when you go to use the newly installed Open MPI. If you don't set your path, >> it will always default to using the version of OpenMPI that ships with >> Leopard. >> >> >> Warner Yuen >> Scientific Computing >> Consulting Engineer >> Apple, Inc. >> email: wy...@apple.com >> Tel: 408.718.2859 >> >> >> >> >> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote: >> >> Send users mailing list submissions to >> us...@open-mpi.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> or, via email, send a message with subject or body 'help' to >> users-requ...@open-mpi.org >> >> You can reach the person managing the list at >> users-ow...@open-mpi.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of users digest..." >> >> >> Today's Topics: >> >> 1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig) >> >> >> -- >> >> Message: 1 >> Date: Mon, 4 May 2009 18:13:45 +0200 >> From: Vicente Puig >> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >> To: Open MPI Users >> Message-ID: >> <3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com> >> Content-Type: text/
Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
I forgot to say that "../../../gcc-4.2-20060805/libgfortran/fmain.c" is neither the path nor the program that I am trying to debug. 2009/5/5 Vicente Puig > But it doesn't work well. > For example, I am trying to debug a program, "floyd" in this case, and when > I make a breakpoint: > > No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c". > > I am getting disappointed and frustrated that I can not work well with > openmpi in my Mac. There should be a was to make it run in Xcode, uff... > > 2009/5/4 Jeff Squyres > >> I get those as well. I believe that they are (annoying but) harmless -- >> an artifact of how the freeware gcc/gofrtran that I use was built. >> >> >> >> On May 4, 2009, at 1:47 PM, Vicente Puig wrote: >> >> Maybe I had to open a new thread, but if you have any idea why I receive >>> it when I use gdb for debugging an openmpi program: >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no >>> debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no >>> debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o" >>> - no debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c". >>> ... >>> >>> >>> >>> There is no 'admin' so I don't know why it happen. It works well with a C >>> program. >>> >>> Any idea??. >>> >>> Thanks. >>> >>> >>> Vincent >>> >>> >>> >>> >>> >>> 2009/5/4 Vicente Puig >>> I can run openmpi perfectly with command line, but I wanted a graphic >>> interface for debugging because I was having problems. >>> >>> Thanks anyway. >>> >>> Vincent >>> >>> 2009/5/4 Warner Yuen >>> >>> Admittedly, I don't use Xcode to build Open MPI either. >>> >>> You can just compile Open MPI from the command line and install >>> everything in /usr/local/. Make sure that gfortran is set in your path and >>> you should just be able to do a './configure --prefix=/usr/local' >>> >>> After the installation, just make sure that your path is set correctly >>> when you go to use the newly installed Open MPI. If you don't set your path, >>> it will always default to using the version of OpenMPI that ships with >>> Leopard. >>> >>> >>> Warner Yuen >>> Scientific Computing >>> Consulting Engineer >>> Apple, Inc. >>> email: wy...@apple.com >>> Tel: 408.718.2859 >>> >>> >>> >>> >>> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote: >>> >>> Send users mailing list submissions to >>> us...@open-mpi.org >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> or, via email, send a message with subject or body 'help' to >>> users-requ...@open-mpi.org >>> >>> You can reach the person managing the list at >>> users-ow...@open-mpi.org >>> >>> When replying, please edit your Subject l
Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
Sorry, I don't understand, how can I try the fortran from macports??. 2009/5/6 Luis Vitorio Cargnini > This problem is occuring because the fortran wasn't compiled with the debug > symbols: > warning: Could not find object file > "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no > debug information available for > "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". > > Is the same problem for who is using LLVM in Xcode, there is no debug > symbols to create a debug release, try create a release and see if it will > compile at all and try the fortran from macports it will works smoothly. > > > Le 09-05-05 à 17:33, Jeff Squyres a écrit : > > > I agree; that is a bummer. :-( >> >> Warner -- do you have any advice here, perchance? >> >> >> On May 4, 2009, at 7:26 PM, Vicente Puig wrote: >> >> But it doesn't work well. >>> >>> For example, I am trying to debug a program, "floyd" in this case, and >>> when I make a breakpoint: >>> >>> No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c". >>> >>> I am getting disappointed and frustrated that I can not work well with >>> openmpi in my Mac. There should be a was to make it run in Xcode, uff... >>> >>> 2009/5/4 Jeff Squyres >>> I get those as well. I believe that they are (annoying but) harmless -- >>> an artifact of how the freeware gcc/gofrtran that I use was built. >>> >>> >>> >>> On May 4, 2009, at 1:47 PM, Vicente Puig wrote: >>> >>> Maybe I had to open a new thread, but if you have any idea why I receive >>> it when I use gdb for debugging an openmpi program: >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no >>> debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no >>> debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o" >>> - no debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c". >>> ... >>> >>> >>> >>> There is no 'admin' so I don't know why it happen. It works well with a C >>> program. >>> >>> Any idea??. >>> >>> Thanks. >>> >>> >>> Vincent >>> >>> >>> >>> >>> >>> 2009/5/4 Vicente Puig >>> I can run openmpi perfectly with command line, but I wanted a graphic >>> interface for debugging because I was having problems. >>> >>> Thanks anyway. >>> >>> Vincent >>> >>> 2009/5/4 Warner Yuen >>> >>> Admittedly, I don't use Xcode to build Open MPI either. >>> >>> You can just compile Open MPI from the command line and install >>> everything in /usr/local/. Make sure that gfortran is set in your path and >>> you should just be able to do a './configure --prefix=/usr/local' >>> >>> After the installation, just make sure that your path is set correctly >>> when you go to use the newly installed Open MPI. If you don't set your path, >>> it will always default to using the version of OpenMPI that ships with >>> Leopard. >>> >
Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel
Hi, "r...@open-mpi.org" writes: > You might want to try using the DVM (distributed virtual machine) > mode in ORTE. You can start it on an allocation using the “orte-dvm” > cmd, and then submit jobs to it with “mpirun --hnp ”, where foo > is either the contact info printed out by orte-dvm, or the name of > the file you told orte-dvm to put that info in. You’ll need to take > it from OMPI master at this point. this question looked interesting so I gave it a try. In a cluster with Slurm I had no problem submitting a job which launched an orte-dvm -report-uri ... and then use that file to launch jobs onto that virtual machine via orte-submit. To be useful to us at this point, I should be able to start executing jobs if there are cores available and just hold them in a queue if the cores are already filled. At this point this is not happenning, and if I try to submit a second job while the previous one has not finished, I get a message like: , | DVM ready | -- | All nodes which are allocated for this job are already filled. | -- ` With the DVM, is it possible to keep these jobs in some sort of queue, so that they will be executed when the cores get free? Thanks, -- Ángel de Vicente http://www.iac.es/galeria/angelv/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel
Hi, "r...@open-mpi.org" writes: >> With the DVM, is it possible to keep these jobs in some sort of queue, >> so that they will be executed when the cores get free? > > It wouldn’t be hard to do so - as long as it was just a simple FIFO > scheduler. I wouldn’t want it to get too complex. a simple FIFO should be probably enough. This can be useful as a simple way to make a multi-core machine accessible to a small group of (friendly) users, making sure that they don't oversubscribe the machine, but without going the full route of installing/maintaining a full resource manager. Cheers, -- Ángel de Vicente http://www.iac.es/galeria/angelv/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel
Hi, Reuti writes: > At first I thought you want to run a queuing system inside a queuing > system, but this looks like you want to replace the resource manager. yes, if this could work reasonably well, we could do without the resource manager. > Under which user account the DVM daemons will run? Are all users using the > same account? Well, if this could work only for one user, this could still be useful as I could use it as I do now use GNU Parallel or a private Condor system, where I can submit hundreds of jobs, but make sure they get executed without oversubscribing. For a small group of users if the DVM can run with my user and there is no restriction on who can use it or if I somehow can authorize others to use it (via an authority file or similar) that should be enough. Thanks, -- Ángel de Vicente http://www.iac.es/galeria/angelv/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] "No objects of the specified type were found on at least one node"?
Hi, I'm trying to get OpenMPI running in a new machine, and I came accross an error message that I hadn't seen before. , | can@login1:> mpirun -np 1 ./code config.txt | -- | No objects of the specified type were found on at least one node: | | Type: Package | Node: login1 | | The map cannot be done as specified. | -- ` Some details: in this machine we have gcc_6.0.3, and with it I installed OpenMPI (v. 2.0.1). The compilation of OpenMPI went without (obvious) errors, and I managed to compile my code without problems (if instead of "mpirun -np 1 ./code" I just run the code directly there are no issues). But if I try to use mpirun in the login node of the cluster I get this message. If I submit the job to the scheduler (the cluster uses slurm) I get the same messsage, but the Node information is obviously different, giving the name of one of the compute nodes. Any pointers as to what can be going on? Many thanks, -- Ángel de Vicente http://www.iac.es/galeria/angelv/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] "No objects of the specified type were found on at least one node"?
Hi, Gilles Gouaillardet writes: > which version of ompi are you running ? 2.0.1 > this error can occur on systems with no NUMA object (e.g. single > socket with hwloc < 2) > as a workaround, you can > mpirun --map-by socket ... with --map-by socket I get exactly the same issue (both in the login and the compute node) I will upgrade to 2.0.2 and see if this changes something. Thanks, -- Ángel de Vicente http://www.iac.es/galeria/angelv/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] "No objects of the specified type were found on at least one node"
Hi, Gilles Gouaillardet writes: > Can you run > lstopo > in your machine, and post the output ? no lstopo in my machine. This is part of hwloc, right? > can you also try > mpirun --map-by socket --bind-to socket ... > and see if it helps ? same issue. Perhaps I need to compile hwloc as well?? -- Ángel de Vicente http://www.iac.es/galeria/angelv/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] "No objects of the specified type were found on at least one node"
Hi again, thanks for your help. I installed the latest OpenMPI (2.0.2). lstopo output: , | lstopo --version | lstopo 1.11.2 | | lstopo | Machine (7861MB) | L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 | (P#0) | L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 | (P#1) | L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 | (P#2) | L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 | (P#3) | HostBridge L#0 | PCIBridge | PCI 1014:028c | Block L#0 "sda" | PCI 14c1:8043 | Net L#1 "myri0" | PCIBridge | PCI 14e4:166b | Net L#2 "eth0" | PCI 14e4:166b | Net L#3 "eth1" | PCIBridge | PCI 1002:515e ` I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and then HDF5 1.10.0-patch1 with it. Our code then compiles OK with it, and it runs OK without "mpirun": , | ./mancha3D | __ ___ |/'\_/`\ /\ \ /'__`\ /\ _ `\ | /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \ | \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \ |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \ | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ \/ | \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/ \/___/ | | ./mancha3D should be given the name of a control file as argument. ` But it complains as before when run with mpirun , | mpirun --map-by socket --bind-to socket -np 1 ./mancha3D | -- | No objects of the specified type were found on at least one node: | | Type: Package | Node: login1 | | The map cannot be done as specified. | -- ` If I submit it directly with srun, then the code runs, but not in parallel, and two individual copies of the code are started: , | srun -n 2 ./mancha3D | __ ___ |/'\_/`\ /\ \ /'__`\ /\ _ `\ | /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \ | \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \ |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \ | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ \/ | \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/ \/___/ | | should be given the name of a control file as argument. | __ ___ |/'\_/`\ /\ \ /'__`\ /\ _ `\ | /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \ | \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \ |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \ | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ \/ | \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/ \/___/ | | should be given the name of a control file as argument. ` Any ideas are welcome. Many thanks, -- Ángel de Vicente http://www.iac.es/galeria/angelv/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] "No objects of the specified type were found on at least one node"
Can this help? If you think any other information could be relevant, let me know. Cheers, Ángel cat /proc/cpuinfo processor : 0 cpu : PPC970MP, altivec supported clock : 2297.70MHz revision: 1.1 (pvr 0044 0101) [4 processors] timebase: 14318000 machine : CHRP IBM,8844-Z0C uname -a Linux login1 2.6.16.60-perfctr-0.42.4-ppc64 #1 SMP Fri Aug 21 15:25:15 CEST 2009 ppc64 ppc64 ppc64 GNU/Linux lsb_release -a Distributor ID: SUSE LINUX Description:SUSE Linux Enterprise Server 10 (ppc) Release:10 On 9 March 2017 at 15:04, Brice Goglin wrote: > What's this machine made of? (processor, etc) > What kernel are you running ? > > Getting no "socket" or "package" at all is quite rare these days. > > Brice > > > > > Le 09/03/2017 15:28, Angel de Vicente a écrit : > > Hi again, > > > > thanks for your help. I installed the latest OpenMPI (2.0.2). > > > > lstopo output: > > > > , > > | lstopo --version > > | lstopo 1.11.2 > > | > > | lstopo > > | Machine (7861MB) > > | L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 > > | (P#0) > > | L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 > > | (P#1) > > | L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 > > | (P#2) > > | L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 > > | (P#3) > > | HostBridge L#0 > > | PCIBridge > > | PCI 1014:028c > > | Block L#0 "sda" > > | PCI 14c1:8043 > > | Net L#1 "myri0" > > | PCIBridge > > | PCI 14e4:166b > > | Net L#2 "eth0" > > | PCI 14e4:166b > > | Net L#3 "eth1" > > | PCIBridge > > | PCI 1002:515e > > ` > > > > I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and then HDF5 > > 1.10.0-patch1 with it. Our code then compiles OK with it, and it runs OK > > without "mpirun": > > > > , > > | ./mancha3D > > | __ ___ > > |/'\_/`\ /\ \ /'__`\ /\ _ `\ > > | /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \ > > | \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \ > > |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ > \ > > | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ > \/ > > | \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/ \/___/ > > | > > | ./mancha3D should be given the name of a control file as argument. > > ` > > > > > > > > > > But it complains as before when run with mpirun > > > > , > > | mpirun --map-by socket --bind-to socket -np 1 ./mancha3D > > | > -- > > | No objects of the specified type were found on at least one node: > > | > > | Type: Package > > | Node: login1 > > | > > | The map cannot be done as specified. > > | > -- > > ` > > > > > > If I submit it directly with srun, then the code runs, but not in > > parallel, and two individual copies of the code are started: > > > > , > > | srun -n 2 ./mancha3D > > | __ ___ > > |/'\_/`\ /\ \ /'__`\ /\ _ `\ > > | /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \ > > | \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \ > > |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ > \ > > | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ > \/ > > | \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/ \/___/ > > | > > | should be given the name of a control file as argument. > > | __ ___ > > |/'\_/`\ /\ \ /'__`\ /\ _ `\ > > | /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \ > > | \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \ > > |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ > \ > > | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ > \/ > > | \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/ \/___/ > > | > > | should be given the name of a control file as argument. > > ` > > > > > > > > Any ideas are welcome. Many thanks, > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] "No objects of the specified type were found on at least one node"
Brice Goglin writes: > Ok, that's a very old kernel on a very old POWER processor, it's > expected that hwloc doesn't get much topology information, and it's > then expected that OpenMPI cannot apply most binding policies. Just in case it can add anything, I tried with an older OpenMPI version (1.10.6), and I cannot get it to work either, but the message is different: , | -- | No objects of the specified type were found on at least one node: | | Type: Socket | Node: s01c1b08 | | The map cannot be done as specified. | ------ ` -- Ángel de Vicente http://www.iac.es/galeria/angelv/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] Help diagnosing problem: not being able to run MPI code across computers
Hi, I have used OpenMPI before without any troubles, and configured MPICH, MPICH2 and OpenMPI in many different machines before, but recently we upgraded the OS to Fedora 17, and now I'm having trouble running an MPI code in two of our machines connected via a switch. I thought perhaps the old installation was giving problems, so I reinstalled OpenMPI (1.6.4) and I have no trouble when running a parallel code in just one node. I also don't have any trouble ssh'ing (without need for password) between these machines, but when I try to run a parallel job spanning both machines, I get a hanged mpiexec process in the submitting machine, and an "orted" process in the other machine, but nothing moves. I guess it is an issue with libraries and/or different MPI versions (the machines have other site-wide MPI libraries installed), but I'm not sure how to debug the issue. I looked in the FAQ, but I didn't find anything relevant. Issue http://www.open-mpi.org/faq/?category=running#intel-compilers-static is different, since I don't get any warning or errors when running, just all processes stuck. Is there any way to dump details of what OpenMPI is trying to do in each node, so I can see if it is looking for different libraries in each node, or something similar? Thanks, -- Ángel de Vicente http://angel-de-vicente.blogspot.com/
Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers
Hi, Ralph Castain writes: > On May 4, 2013, at 4:54 PM, Angel de Vicente wrote: >> >> Is there any way to dump details of what OpenMPI is trying to do in each >> node, so I can see if it is looking for different libraries in each >> node, or something similar? thanks for the suggestions, but I'm still stuck: > What I do is simply "ssh ompi_info -V" to each remote node and compare > results - you should get the same answer everywhere. exactly the same information in the three connected machines > Another option in these situations is to configure > --enable-orterun-prefix-by-default. If you install in the same > location on each node (e.g., on an NSF mount), then this will ensure > you get that same library. Re-configured and re-compiled OpenMPI, but I get the same behaviour. I'm starting to think that perhaps is a firewall issue? I don't have root access in these machines but I'll try to investigate. Cheers, -- Ángel de Vicente http://angel-de-vicente.blogspot.com/
Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers
Hi, "Jeff Squyres (jsquyres)" writes: >>> I'm starting to think that perhaps is a firewall issue? I don't have >>> root access in these machines but I'll try to investigate. > A simple test is to try any socket-based server app between the two > machines that opens a random listening socket. Try to telnet to it > from the other machine. If it fails to connect, then you likely have > a firewalling issue. yes, that's just what I did with orted. I saw the port that it was trying to connect and telnet to it, and I got "No route to host", so that's why I was going the firewall path. Hopefully the sysadmins can disable the firewall for the internal network today, and I can see if that solves the issue. Thanks, -- Ángel de Vicente http://angel-de-vicente.blogspot.com/
Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers
Hi again, Angel de Vicente writes: > yes, that's just what I did with orted. I saw the port that it was > trying to connect and telnet to it, and I got "No route to host", so > that's why I was going the firewall path. Hopefully the sysadmins can > disable the firewall for the internal network today, and I can see if > that solves the issue. OK, removing the firewall for the private network improved things a lot. A simple "Hello World" seems to work without issues, but if I run my code, I have a problem like this: [angelv@comer RTI2D.Parallel]$ mpiexec -prefix $OMPI_PREFIX -hostfile $MPI_HOSTS -n 10 ../../../mancha2D_mpi_h5fc.x mancha.trol [...] [comer][[58110,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 161.72.206.3 failed: No route to host (113) [comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] [comer][[58110,1],3][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 161.72.206.3 failed: No route to host (113) connect() to 161.72.206.3 failed: No route to host (113) [comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 161.72.206.3 failed: No route to host (113) [comer][[58110,1],2][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 161.72.206.3 failed: No route to host (113) But MPI_HOSTS points to a file with $ cat /net/nas7/polar/minicluster/machinefile-openmpi c0 slots=5 c1 slots=5 c2 slots=5 c0, c1, and c2 are the names of the machines in the internal network, but for some reason it is using the public interfaces and complaining (the firewall in those is still active). I thought just specifying the names of the machines in the machinefile would make sure that we were using the right interface... Any help? Thanks, -- Ángel de Vicente http://angel-de-vicente.blogspot.com/
Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers
Hi, "Jeff Squyres (jsquyres)" writes: > The list of names in the hostfile specifies the servers that will be used, > not the network interfaces. Have a look at the TCP portion of the FAQ: > > http://www.open-mpi.org/faq/?category=tcp Thanks a lot for this. Now it works OK if I run it like [angelv@comer RTI2D.Parallel]$ mpiexec -loadbalance --mca btl_tcp_if_include p1p1 -prefix $OMPI_PREFIX -hostfile $MPI_HOSTS -n 4 ../../../mancha2D_mpi_h5fc.x\ mancha.trol But, the FAQ seems to be wrong, since it also says that I should be able to run like: [angelv@comer RTI2D.Parallel]$ mpiexec -loadbalance --mca btl_tcp_if_include 192.168.1.x/24 -prefix $OMPI_PREFIX -hostfile $MPI_HOSTS -n 4 ../../../mancha2D_\ mpi_h5fc.x mancha.trol but then I get the following error: -- WARNING: An invalid value was given for btl_tcp_if_include. This value will be ignored. Local host: catar Value: 192.168.1.x/24 Message:Invalid specification (inet_pton() failed) -- If I specify the subnet as 192.168.1.0/24 all is in order. I'm running 1.6.4: [angelv@comer RTI2D.Parallel]$ ompi_info Package: Open MPI angelv@comer Distribution Open MPI: 1.6.4 Thanks, -- Ángel de Vicente http://angel-de-vicente.blogspot.com/
[OMPI users] init_thread + spawn error
Hi all! I'm getting a error on call MPI_Init_thread and MPI_Comm_spawn. am I mistaking something? the attachments contains my ompi_info and source ... thank! Joao char *arg[]= {"spawn1", (char *)0}; MPI_Init_thread (&argc, &argv, MPI_THREAD_MULTIPLE, &provided); MPI_Comm_spawn ("./spawn_slave", arg, 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &slave, MPI_ERRCODES_IGNORE); . and the error: opal_mutex_lock(): Resource deadlock avoided [c8:13335] *** Process received signal *** [c8:13335] Signal: Aborted (6) [c8:13335] Signal code: (-6) [c8:13335] [ 0] [0xb7fbf440] [c8:13335] [ 1] /lib/libc.so.6(abort+0x101) [0xb7abd5b1] [c8:13335] [ 2] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e2933c] [c8:13335] [ 3] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e2923a] [c8:13335] [ 4] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e292e3] [c8:13335] [ 5] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e29fa7] [c8:13335] [ 6] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e29eda] [c8:13335] [ 7] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e2adec] [c8:13335] [ 8] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0(ompi_proc_unpack+ 0x181) [0xb7e2b142] [c8:13335] [ 9] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0(ompi_comm_connect _accept+0x57c) [0xb7e0fb70] [c8:13335] [10] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0(PMPI_Comm_spawn+0 x395) [0xb7e5e285] [c8:13335] [11] ./spawn(main+0x7f) [0x80486ef] [c8:13335] [12] /lib/libc.so.6(__libc_start_main+0xdc) [0xb7aa7ebc] [c8:13335] [13] ./spawn [0x80485e1] [c8:13335] *** End of error message *** -- mpirun has exited due to process rank 0 with PID 13335 on node c8 calling "abort". This will have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- #include "mpi.h" #include int main (int argc, char **argv) { int provided; MPI_Comm slave; char *arg[]= {"spawn1", (char *)0}; MPI_Init_thread (&argc, &argv, MPI_THREAD_MULTIPLE, &provided); MPI_Comm_spawn ("./spawn_slave", arg, 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &slave, MPI_ERRCODES_IGNORE); MPI_Finalize (); return 0; } Open MPI: 1.3a1r16236 Open MPI SVN revision: r16236 Open RTE: 1.3a1r16236 Open RTE SVN revision: r16236 OPAL: 1.3a1r16236 OPAL SVN revision: r16236 Prefix: /usr/local/openmpi/openmpi-svn Configured architecture: i686-pc-linux-gnu Configure host: corisco Configured by: lima Configured on: Wed Sep 26 11:37:04 BRT 2007 Configure host: corisco Built by: lima Built on: Wed Sep 26 12:07:13 BRT 2007 Built host: corisco C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: no Fortran90 bindings size: na C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: g77 Fortran77 compiler abs: /usr/bin/g77 Fortran90 compiler: none Fortran90 compiler abs: none C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: no C++ exceptions: no Thread support: posix (mpi: yes, progress: no) Sparse Groups: no Internal debug support: yes MPI parameter check: runtime Memory profiling support: yes Memory debugging support: yes libltdl support: yes Heterogeneous support: yes mpirun default --prefix: no MPI I/O support: yes MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.3) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.3) MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.3) MCA timer: linux (MCA v1.0, API v1.0, Component v1.3) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.3) MCA coll: inter (MCA v1.0, API v1.0, Component v1.3) MCA coll: self (MCA v1.0, API v1.0, Component v1.3) MCA coll: sm (MCA v1.0, API v1.0, Component v1.3) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.3) MCA io: romio (MCA v1.0, API v1.0, Component v1.3) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.3) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.3)
[OMPI users] MPI_Comm_spawn errors
Hi all, I'm getting errors with spawn in the situations: 1) spawn1.c - spawning 2 process on localhost, one by one, the error is: spawning ... [localhost:31390] *** Process received signal *** [localhost:31390] Signal: Segmentation fault (11) [localhost:31390] Signal code: Address not mapped (1) [localhost:31390] Failing at address: 0x98 [localhost:31390] [ 0] /lib/libpthread.so.0 [0x2b1d38a17ed0] [localhost:31390] [ 1] /usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_comm_dyn_finalize+0xd2) [0x2b1d37667cb2] [localhost:31390] [ 2] /usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_comm_finalize+0x3b) [0x2b1d3766358b] [localhost:31390] [ 3] /usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_mpi_finalize+0x248) [0x2b1d37679598] [localhost:31390] [ 4] ./spawn1(main+0xac) [0x400ac4] [localhost:31390] [ 5] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b1d38c43b74] [localhost:31390] [ 6] ./spawn1 [0x400989] [localhost:31390] *** End of error message *** -- mpirun has exited due to process rank 0 with PID 31390 on node localhost calling "abort". This will have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- With 1 process spawned or with 2 process spawned in one call there is no output from child. 2) spawn2.c - no response, this init is MPI_Init_thread (&argc, &argv, MPI_THREAD_MULTIPLE, &required) the attachments contains the programs, ompi_info and config.log. Some suggest ? thanks a lot. Joao. spawn1.c.gz Description: GNU Zip compressed data spawn2.c.gz Description: GNU Zip compressed data ompi_info.txt.gz Description: GNU Zip compressed data config.log.gz Description: GNU Zip compressed data
[OMPI users] Spawn problem
Hi, sorry bring this again ... but i hope use spawn in ompi someday :-D The execution of spawn in this way works fine: MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE); but if this code go to a for I get a problem : for (i= 0; i < 2; i++) { MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE); } and the error is: spawning ... child! child! [localhost:03892] *** Process received signal *** [localhost:03892] Signal: Segmentation fault (11) [localhost:03892] Signal code: Address not mapped (1) [localhost:03892] Failing at address: 0xc8 [localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0] [localhost:03892] [ 1] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3) [0x2ac71ba7448c] [localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2ac71b9decdf] [localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2ac71ba04765] [localhost:03892] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71) [0x2ac71ba365c9] [localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2] [localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) [0x2ac71ccb7b74] [localhost:03892] [ 7] ./spawn1 [0x400989] [localhost:03892] *** End of error message *** -- mpirun noticed that process rank 0 with PID 3892 on node localhost exited on signal 11 (Segmentation fault). -- the attachments contain the ompi_info, config.log and program. thanks for some check, Joao. config.log.gz Description: GNU Zip compressed data ompi_info.txt.gz Description: GNU Zip compressed data spawn1.c.gz Description: GNU Zip compressed data
Re: [OMPI users] Spawn problem
Really MPI_Finalize is crashing and calling MPI_Comm_{free,disconnect} works! I don't know if the free/disconnect must appear before a MPI_Finalize for this case (spawn processes) some suggest ? I use loops in spawn: - first for testing :) - and second because certain MPI applications don't know in advance the number of childrens needed to complete his work. The spawn works is creat ... I will made other tests. thanks, Joao On Mon, Mar 31, 2008 at 3:03 AM, Matt Hughes wrote: > On 30/03/2008, Joao Vicente Lima wrote: > > Hi, > > sorry bring this again ... but i hope use spawn in ompi someday :-D > > I believe it's crashing in MPI_Finalize because you have not closed > all communication paths between the parent and the child processes. > For the parent process, try calling MPI_Comm_free or > MPI_Comm_disconnect on each intercomm in your intercomm array before > calling finalize. On the child, call free or disconnect on the parent > intercomm before calling finalize. > > Out of curiosity, why a loop of spawns? Why not increase the value of > the maxprocs argument, or if you need to spawn different executables, > or use different arguments for each instance, why not > MPI_Comm_spawn_multiple? > > mch > > > > > > > > > The execution of spawn in this way works fine: > > MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0, > > MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE); > > > > but if this code go to a for I get a problem : > > for (i= 0; i < 2; i++) > > { > > MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1, > > MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE); > > } > > > > and the error is: > > spawning ... > > child! > > child! > > [localhost:03892] *** Process received signal *** > > [localhost:03892] Signal: Segmentation fault (11) > > [localhost:03892] Signal code: Address not mapped (1) > > [localhost:03892] Failing at address: 0xc8 > > [localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0] > > [localhost:03892] [ 1] > > /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3) > > [0x2ac71ba7448c] > > [localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 > [0x2ac71b9decdf] > > [localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 > [0x2ac71ba04765] > > [localhost:03892] [ 4] > > /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71) > > [0x2ac71ba365c9] > > [localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2] > > [localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) > [0x2ac71ccb7b74] > > [localhost:03892] [ 7] ./spawn1 [0x400989] > > [localhost:03892] *** End of error message *** > > -- > > mpirun noticed that process rank 0 with PID 3892 on node localhost > > exited on signal 11 (Segmentation fault). > > -- > > > > the attachments contain the ompi_info, config.log and program. > > > > thanks for some check, > > > > Joao. > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Spawn problem
Hi again, when I call MPI_Init_thread in the same program the error is: spawning ... opal_mutex_lock(): Resource deadlock avoided [localhost:07566] *** Process received signal *** [localhost:07566] Signal: Aborted (6) [localhost:07566] Signal code: (-6) [localhost:07566] [ 0] /lib/libpthread.so.0 [0x2abe5630ded0] [localhost:07566] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2abe5654c3c5] [localhost:07566] [ 2] /lib/libc.so.6(abort+0x10e) [0x2abe5654d73e] [localhost:07566] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe5528063b] [localhost:07566] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280559] [localhost:07566] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe552805e8] [localhost:07566] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280fff] [localhost:07566] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280f3d] [localhost:07566] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55281f59] [localhost:07566] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+ 0x204) [0x2abe552823cd] [localhost:07566] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2a be58efb5f7] [localhost:07566] [11] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(MPI_Comm_spawn+0x 465) [0x2abe552b55cd] [localhost:07566] [12] ./spawn1(main+0x9d) [0x400b05] [localhost:07566] [13] /lib/libc.so.6(__libc_start_main+0xf4) [0x2abe56539b74] [localhost:07566] [14] ./spawn1 [0x4009d9] [localhost:07566] *** End of error message *** opal_mutex_lock(): Resource deadlock avoided [localhost:07567] *** Process received signal *** [localhost:07567] Signal: Aborted (6) [localhost:07567] Signal code: (-6) [localhost:07567] [ 0] /lib/libpthread.so.0 [0x2b48610f9ed0] [localhost:07567] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b48613383c5] [localhost:07567] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b486133973e] [localhost:07567] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c63b] [localhost:07567] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c559] [localhost:07567] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c5e8] [localhost:07567] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006cfff] [localhost:07567] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006cf3d] [localhost:07567] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006df59] [localhost:07567] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+ 0x204) [0x2b486006e3cd] [localhost:07567] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b 4863ce75f7] [localhost:07567] [11] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b 4863ce9c2b] [localhost:07567] [12] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b48600720d7] [localhost:07567] [13] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Init_thread+ 0x166) [0x2b48600ae4f2] [localhost:07567] [14] ./spawn1(main+0x2c) [0x400a94] [localhost:07567] [15] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b4861325b74] [localhost:07567] [16] ./spawn1 [0x4009d9] [localhost:07567] *** End of error message *** -- mpirun noticed that process rank 0 with PID 7566 on node localhost exited on sig nal 6 (Aborted). -- thank for some check, Joao. On Mon, Mar 31, 2008 at 11:49 AM, Joao Vicente Lima wrote: > Really MPI_Finalize is crashing and calling MPI_Comm_{free,disconnect} works! > I don't know if the free/disconnect must appear before a MPI_Finalize > for this case (spawn processes) some suggest ? > > I use loops in spawn: > - first for testing :) > - and second because certain MPI applications don't know in advance > the number of childrens needed to complete his work. > > The spawn works is creat ... I will made other tests. > > thanks, > Joao > > > > On Mon, Mar 31, 2008 at 3:03 AM, Matt Hughes > wrote: > > On 30/03/2008, Joao Vicente Lima wrote: > > > Hi, > > > sorry bring this again ... but i hope use spawn in ompi someday :-D > > > > I believe it's crashing in MPI_Finalize because you have not closed > > all communication paths between the parent and the child processes. > > For the parent process, try calling MPI_Comm_free or > > MPI_Comm_disconnect on each intercomm in your intercomm array before > > calling finalize. On the child, call free or disconnect on the parent > > intercomm before calling finalize. > > > > Out of curiosity, why a loop of spawns? Why not increase the value of > > the maxprocs argument, or if you need to spawn different executables, > > or use different arguments for each instance, why not > > MPI_Comm_spawn_multiple? > > > > mch > > > > > > > > > > > > > > > > The execution of spawn in this way works fine
[OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?
Hi, in one of our codes, we want to create a log of events that happen in the MPI processes, where the number of these events and their timing is unpredictable. So I implemented a simple test code, where process 0 creates a thread that is just busy-waiting for messages from any process, and which is sent to stdout/stderr/log file upon receiving them. The test code is at https://github.com/angel-devicente/thread_io and the same idea went into our "real" code. As far as I could see, this behaves very nicely, there are no deadlocks, no lost messages and the performance penalty is minimal when considering the real application this is intended for. But then I found that in a local cluster the performance was very bad (from ~5min 50s to ~5s for some test) when run with the locally installed OpenMPI and my own OpenMPI installation (same gcc and OpenMPI versions). Checking the OpenMPI configuration details, I found that the locally installed OpenMPI was configured to use the Mellanox IB driver, and in particular the hcoll component was somehow killing performance: running with mpirun --mca coll_hcoll_enable 0 -np 51 ./test_t was taking ~5s, while enabling coll_hcoll was killing performance, as stated above (when run in a single node the performance also goes down, but only about a factor 2X). Has anyone seen anything like this? Perhaps a newer Mellanox driver would solve the problem? We were planning on making our code public, but before we do so, I want to understand under which conditions we could have this problem with the "Threaded I/O" approach and if possible how to get rid of it completely. Any help/pointers appreciated. -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?
Hi, George Bosilca writes: > If I'm not mistaken, hcoll is playing with the opal_progress in a way > that conflicts with the blessed usage of progress in OMPI and prevents > other components from advancing and timely completing requests. The > impact is minimal for sequential applications using only blocking > calls, but is jeopardizing performance when multiple types of > communications are simultaneously executing or when multiple threads > are active. > > The solution might be very simple: hcoll is a module providing support > for collective communications so as long as you don't use collectives, > or the tuned module provides collective performance similar to hcoll > on your cluster, just go ahead and disable hcoll. You can also reach > out to Mellanox folks asking them to fix the hcoll usage of > opal_progress. until we find a more robust solution I was thinking on trying to just enquiry the MPI implementation at running time and use the threaded version if hcoll is not present and go for the unthreaded version if it is. Looking at the coll.h file I see that some functions there might be useful, for example: mca_coll_base_component_comm_query_2_0_0_fn_t, but I have never delved here. Would this be an appropriate approach? Any examples on how to enquiry in code for a particular component? Thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?
Hi, Joshua Ladd writes: > We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it > takes exactly the same 19 secs (80 ranks). > > What version of HCOLL are you using? Command line? Thanks for having a look at this. According to ompi_info, our OpenMPI (version 3.0.1) was configured with (and gcc version 7.2.0): , | Configure command line: 'CFLAGS=-I/apps/OPENMPI/SRC/PMI/include' | '--prefix=/storage/apps/OPENMPI/3.0.1/gnu' | '--with-mxm=/opt/mellanox/mxm' | '--with-hcoll=/opt/mellanox/hcoll' | '--with-knem=/opt/knem-1.1.2.90mlnx2' | '--with-slurm' '--with-pmi=/usr' | '--with-pmi-libdir=/usr/lib64' | '--with-platform=../contrib/platform/mellanox/optimized' ` Not sure if there is a better way to find out the HCOLL version, but the file hcoll_version.h in /opt/mellanox/hcoll/include/hcoll/api/ says we have version 3.8.1649 Code compiled as: , | $ mpicc -o test_t thread_io.c test.c ` To run the tests, I just submit the job to Slurm with the following script (changing the coll_hcoll_enable param accordingly): , | #!/bin/bash | # | #SBATCH -J test | #SBATCH -N 5 | #SBATCH -n 51 | #SBATCH -t 00:07:00 | #SBATCH -o test-%j.out | #SBATCH -e test-%j.err | #SBATCH -D . | | module purge | module load openmpi/gnu/3.0.1 | | time mpirun --mca coll_hcoll_enable 1 -np 51 ./test_t ` In the latest test I managed to squeeze in our queuing system, the hcoll-disabled run took ~3.5s, and the hcoll-enabled one ~43.5s (in this one I actually commented out all the fprintf statements just in case, so the code was pure communication). Thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?
Hi, Joshua Ladd writes: > This is an ancient version of HCOLL. Please upgrade to the latest > version (you can do this by installing HPC-X > https://www.mellanox.com/products/hpc-x-toolkit) Just to close the circle and inform that all seems OK now. I don't have root permission in this machine, so I could not change the Mellanox drivers (4.1.1.0.2), but I downloaded the latest (as far as I can tell) HPC-X version compatible with that driver (v.2.1.0), which comes with hcoll version (4.0.2127). I compiled the OpenMPI version that comes with HPC-X toolkit, so that I was using the same compiler (gcc-7.2.0) used in the cluster version of OpenMPI, then HDF5 as well. And with that, both the thread_io test code and our real app seem to behave very nicely and I get basically same timings when using the MPI_THREAD_MULTIPLE or the single threaded MPI version. All sorted. Many thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
[OMPI users] Trouble compiling OpenMPI with Infiniband support
Hi, I'm trying to compile the latest OpenMPI version with Infiniband support in our local cluster, but didn't get very far (since I'm installing this via Spack, I also asked in their support group). I'm doing the installation via Spack, which is issuing the following .configure step (see the options given for --with-knem, --with-hcoll and --with-mxm): , | configure' | '--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/openmpi-4.1.1-jsvbusyjgthr2d6oyny5klt62gm6ma2u' | '--enable-shared' '--disable-silent-rules' '--disable-builtin-atomics' | '--enable-static' '--without-pmi' | '--with-zlib=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/zlib-1.2.11-hrstx5ffrg4f4k3xc2anyxed3mmgdcoz' | '--enable-mpi1-compatibility' '--with-knem=/opt/knem-1.1.2.90mlnx2' | '--with-hcoll=/opt/mellanox/hcoll' '--without-psm' '--without-ofi' | '--without-cma' '--without-ucx' '--without-fca' | '--with-mxm=/opt/mellanox/mxm' '--without-verbs' '--without-xpmem' | '--without-psm2' '--without-alps' '--without-lsf' '--without-sge' | '--without-slurm' '--without-tm' '--without-loadleveler' | '--disable-memchecker' | '--with-libevent=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/libevent-2.1.12-yd5l4tjmnigv6dqlv5afpn4zc6ekdchc' | '--with-hwloc=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/hwloc-2.6.0-bfnt4g3givflydpe5d2iglyupgbzxbfn' | '--disable-java' '--disable-mpi-java' '--without-cuda' | '--enable-wrapper-rpath' '--disable-wrapper-runpath' '--disable-mpi-cxx' | '--disable-cxx-exceptions' | '--with-wrapper-ldflags=-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib/gcc/x86_64-pc-linux-gnu/9.3.0 | -Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib64' ` Later on in the configuration phase I see: , | --- MCA component btl:openib (m4 configuration macro) | checking for MCA component btl:openib compile mode... static | checking whether expanded verbs are available... yes | checking whether IBV_EXP_ATOMIC_HCA_REPLY_BE is declared... yes | checking whether IBV_EXP_QP_CREATE_ATOMIC_BE_REPLY is declared... yes | checking whether ibv_exp_create_qp is declared... yes | checking whether ibv_exp_query_device is declared... yes | checking whether IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG is declared... yes | checking for struct ibv_exp_device_attr.ext_atom... yes | checking for struct ibv_exp_device_attr.exp_atomic_cap... yes | checking if MCA component btl:openib can compile... no ` This is the first time I try to compile OpenMPI this way, and I get a bit confused with what each bit is doing, but it looks like it goes through the moves to get the btl:openib built, but then for some reason it cannot compile it. Any suggestions/pointers? Many thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support
Hello, Gilles Gouaillardet via users writes: > Infiniband detection likely fails before checking expanded verbs. thanks for this. At the end, after playing a bit with different options, I managed to install OpenMPI 3.1.0 OK in our cluster using UCX (I wanted 4.1.1, but that would not compile cleanly with the old version of UCX that is installed in the cluster). The configure command line (as reported by ompi_info) was: , | Configure command line: '--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/openmpi-3.1.0-g5a7szwxcsgmyibqvwwavfkz5b4i2ym7' | '--enable-shared' '--disable-silent-rules' | '--disable-builtin-atomics' '--with-pmi=/usr' | '--with-zlib=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/zlib-1.2.11-hrstx5ffrg4f4k3xc2anyxed3mmgdcoz' | '--without-knem' '--with-hcoll=/opt/mellanox/hcoll' | '--without-psm' '--without-ofi' '--without-cma' | '--with-ucx=/opt/ucx' '--without-fca' | '--without-mxm' '--without-verbs' '--without-xpmem' | '--without-psm2' '--without-alps' '--without-lsf' | '--without-sge' '--with-slurm' '--without-tm' | '--without-loadleveler' '--disable-memchecker' | '--with-hwloc=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/hwloc-1.11.13-kpjkidab37wn25h2oyh3eva43ycjb6c5' | '--disable-java' '--disable-mpi-java' | '--without-cuda' '--enable-wrapper-rpath' | '--disable-wrapper-runpath' '--disable-mpi-cxx' | '--disable-cxx-exceptions' | '--with-wrapper-ldflags=-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib/gcc/x86_\ | 64-pc-linux-gnu/9.3.0 | -Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib64' ` The versions that I'm using are: gcc: 9.3.0 mxm: 3.6.3102 (though I configure OpenMPI --without-mxm) hcoll: 3.8.1649 knem: 1.1.2.90mlnx2 (though I configure OpenMPI --without-knem) ucx: 1.2.2947 slurm: 18.08.7 It looks like everything executes fine, but I have a couple of warnings, and I'm not sure how much I should worry and what I could do about them: 1) Conflicting CPU frequencies detected: [1645221586.038838] [s01r3b78:11041:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3151.41 [1645221585.740595] [s01r3b79:11484:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 2998.76 2) Won't use knem. In a previous try, I was specifying --with-knem, but I was getting this warning about not being able to open /dev/knem. I guess our cluster is not properly configured w.r.t knem, so I built OpenMPI again --without-knem, but I still get this message? [1645221587.091122] [s01r3b74:9054 :0] shm.c:65 MXM WARN Could not open the KNEM device file at /dev/knem : No such file or directory. Won't use knem. [1645221587.104807] [s01r3b76:8610 :0] shm.c:65 MXM WARN Could not open the KNEM device file at /dev/knem : No such file or directory. Won't use knem. Any help/pointers appreciated. Many thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support
Hello, "Jeff Squyres (jsquyres)" writes: > I'd recommend against using Open MPI v3.1.0 -- it's quite old. If you > have to use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which > has all the rolled-up bug fixes on the v3.1.x series. > > That being said, Open MPI v4.1.2 is the most current. Open MPI v4.1.2 does > restrict which versions of UCX it uses because there are bugs in the older > versions of UCX. I am not intimately familiar with UCX -- you'll need to ask > Nvidia for support there -- but I was under the impression that it's just a > user-level library, and you could certainly install your own copy of UCX to > use > with your compilation of Open MPI. I.e., you're not restricted to whatever > UCX > is installed in the cluster system-default locations. I did follow your advice, so I compiled my own version of UCX (1.11.2) and OpenMPI v4.1.1, but for some reason the latency / bandwidth numbers are really bad compared to the previous ones, so something is wrong, but not sure how to debug it. > I don't know why you're getting MXM-specific error messages; those don't > appear > to be coming from Open MPI (especially since you configured Open MPI with > --without-mxm). If you can upgrade to Open MPI v4.1.2 and the latest UCX, see > if you are still getting those MXM error messages. In this latest attempt, yes, the MXM error messages are still there. Cheers, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support
Hello, John Hearns via users writes: > Stupid answer from me. If latency/bandwidth numbers are bad then check > that you are really running over the interface that you think you > should be. You could be falling back to running over Ethernet. I'm quite out of my depth here, so all answers are helpful, as I might have skipped something very obvious. In order to try and avoid the possibility of falling back to running over Ethernet, I submitted the job with: mpirun -n 2 --mca btl ^tcp osu_latency which gives me the following error: , | At least one pair of MPI processes are unable to reach each other for | MPI communications. This means that no Open MPI device has indicated | that it can be used to communicate between these processes. This is | an error; Open MPI requires that all MPI processes be able to reach | each other. This error can sometimes be the result of forgetting to | specify the "self" BTL. | | Process 1 ([[37380,1],1]) is on host: s01r1b20 | Process 2 ([[37380,1],0]) is on host: s01r1b19 | BTLs attempted: self | | Your MPI job is now going to abort; sorry. ` This is certainly not happening when I use the "native" OpenMPI, etc. provided in the cluster. I have not knowingly specified anywhere not to support "self", so I have no clue what might be going on, as I assumed that "self" was always built for OpenMPI. Any hints on what (and where) I should look for? Many thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support
Hello, Joshua Ladd writes: > These are very, very old versions of UCX and HCOLL installed in your > environment. Also, MXM was deprecated years ago in favor of UCX. What > version of MOFED is installed (run ofed_info -s)? What HCA generation > is present (run ibstat). MOFED is: MLNX_OFED_LINUX-4.1-1.0.2.0 As for the HCA generation, we don't seem to have the command ibstat installed, any other way to get this info? But I *think* they are ConnectX-3. > > Stupid answer from me. If latency/bandwidth numbers are bad then check > > that you are really running over the interface that you think you > > should be. You could be falling back to running over Ethernet. apparently the problem with my first attempt was that I was installing a very bare version of UCX. I re-did the installation with the following configuration: , | '--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/ucx-1.11.2-67aihiwsolnad6aqt2ei6j6iaptqgecf' | '--enable-mt' '--enable-cma' '--disable-params-check' '--with-avx' | '--enable-optimizations' '--disable-assertions' '--disable-logging' | '--with-pic' '--with-rc' '--with-ud' '--with-dc' '--without-mlx5-dv' | '--with-ib-hw-tm' '--with-dm' '--with-cm' '--without-rocm' | '--without-java' '--without-cuda' '--without-gdrcopy' '--with-knem' | '--without-xpmem' ` and now the numbers are very good, most of the time better than the "native" OpenMPI provided in the cluster. So now I wanted to try another combination, using the Intel compiler instead of gnu one. Apparently everything was compiled OK, and when I try to run the OSU Microbenchmaks I have no problems with the point-to-point benchmarks, but I get Segmentation Faults: , | load intel/2018.2 Set Intel compilers (LICENSE NEEDED! Please, contact support if you have any issue with license) | /scratch/slurm/job1182830/slurm_script: line 59: unalias: despacktivate: not found | [s01r2b22:26669] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././.][./././././././.] | [s01r2b23:20286] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././././././.][./././././././.] | [s01r2b22:26681:0] Caught signal 11 (Segmentation fault) | [s01r2b23:20292:0] Caught signal 11 (Segmentation fault) | backtrace | 2 0x001c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641 | 3 0x0010055c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616 | 4 0x00034950 killpg() ??:0 | 5 0x000a7d41 PMPI_Comm_rank() ??:0 | 6 0x00402e56 main() ??:0 | 7 0x000206e5 __libc_start_main() ??:0 | 8 0x00402ca9 _start() /home/abuild/rpmbuild/BUILD/glibc-2.22/csu/../sysdeps/x86_64/start.S:118 | === | backtrace | 2 0x001c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641 | 3 0x0010055c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616 | 4 0x00034950 killpg() ??:0 | 5 0x000a7d41 PMPI_Comm_rank() ??:0 | 6 0x00402e56 main() ??:0 | 7 0x000206e5 __libc_start_main() ??:0 | 8 0x00402ca9 _start() /home/abuild/rpmbuild/BUILD/glibc-2.22/csu/../sysdeps/x86_64/start.S:118 | === ` Any idea how I could try to debug/solve this? Thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
[OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none
Hello, I'm running out of ideas, and wonder if someone here could have some tips on how to debug a segmentation fault I'm having with my application [due to the nature of the problem I'm wondering if the problem is with OpenMPI itself rather than my app, though at this point I'm not leaning strongly either way]. The code is hybrid MPI+OpenMP and I compile it with gcc 10.3.0 and OpenMPI 4.1.3. Usually I was running the code with "mpirun -np X --bind-to none [...]" so that the threads created by OpenMP don't get bound to a single core and I actually get proper speedup out of OpenMP. Now, since I introduced some changes to the code this week (though I have read the changes carefully a number of times, and I don't see anything suspicious), I now get a segmentation fault sometimes, but only when I run with "--bind-to none" and only in my workstation. It is not always with the same running configuration, but I can see some pattern, and the problem shows up only if I run the version compiled with OpenMP support and most of the times only when the number of rank*threads goes above 4 or so. If I run it with "--bind-to socket" all looks good all the time. If I run it in another server, "--bind-to none" doesn't seem to be any issue (I submitted the jobs many many times and not a single segmentation fault), but in my workstation it fails almost every time if using MPI+OpenMP with a handful of threads and with "--bind-to none". In this other server I'm running gcc 9.3.0 and OpenMPI 4.1.3. For example, setting OMP_NUM_THREADS to 1, I run the code like the following, and get the segmentation fault as below: , | angelv@sieladon:~/.../Fe13_NL3/t~gauss+isat+istim$ mpirun -np 4 --bind-to none ../../../../../pcorona+openmp~gauss Fe13_NL3.params | Reading control file: Fe13_NL3.params | ... Control file parameters broadcasted | | [...] | | Starting calculation loop on the line of sight | Receiving results from:2 | Receiving results from:1 | | Program received signal SIGSEGV: Segmentation fault - invalid memory reference. | | Backtrace for this error: | Receiving results from:3 | #0 0x7fd747e7555f in ??? | #1 0x7fd7488778e1 in ??? | #2 0x7fd7488667a4 in ??? | #3 0x7fd7486fe84c in ??? | #4 0x7fd7489aa9ce in ??? | #5 0x414959 in __pcorona_main_MOD_main_loop._omp_fn.0 | at src/pcorona_main.f90:627 | #6 0x7fd74813ec75 in ??? | #7 0x412bb0 in pcorona | at src/pcorona.f90:49 | #8 0x40361c in main | at src/pcorona.f90:17 | | [...] | | -- | mpirun noticed that process rank 3 with PID 0 on node sieladon exited on signal 11 (Segmentation fault). | --- ` I cannot see inside the MPI library (I don't really know if that would be helpful) but line 627 in pcorona_main.f90 is: , | call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror) ` Any ideas/suggestions what could be going on or how to try an get some more clues about the possible causes of this? Many thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none
Thanks Gilles, Gilles Gouaillardet via users writes: > You can first double check you > MPI_Init_thread(..., MPI_THREAD_MULTIPLE, ...) my code uses "mpi_thread_funneled" and OpenMPI was compiled with MPI_THREAD_MULTIPLE support: , | ompi_info | grep -i thread | Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes) |FT Checkpoint support: no (checkpoint thread: no) ` Cheers, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none
Hello Jeff, "Jeff Squyres (jsquyres)" writes: > With THREAD_FUNNELED, it means that there can only be one thread in > MPI at a time -- and it needs to be the same thread as the one that > called MPI_INIT_THREAD. > > Is that the case in your app? the master rank (i.e. 0) never creates threads, while other ranks go through the following to communicate with it, so I check that it is indeed the master thread communicating only: , |tid = 0 | #ifdef _OPENMP |tid = omp_get_thread_num() | #endif | |do | if (tid == 0) then | call mpi_send(my_rank, 1, mpi_integer, master, ask_job, & | mpi_comm_world, mpierror) | call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror) | | if (stat(mpi_tag) == stop_signal) then | call mpi_recv(b_,1,mpi_integer,master,stop_signal, & | mpi_comm_world,stat,mpierror) | else | call mpi_recv(iyax,1,mpi_integer,master,give_job, & | mpi_comm_world,stat,mpierror) | end if | end if | | !$omp barrier | | [... actual work...] ` > Also, what is your app doing at src/pcorona_main.f90:627? It is the mpi_probe call above. In case it can clarify things, my app follows a master-worker paradigm, where rank 0 hands over jobs, and all mpi ranks > 0 just do the following: , | !$OMP PARALLEL DEFAULT(NONE) | do | ! (the code above) | if (tid == 0) then receive job number | stop signal | | !$OMP DO schedule(dynamic) | loop_izax: do izax=sol_nz_min,sol_nz_max | | [big computing loop body] | | end do loop_izax | !$OMP END DO | | if (tid == 0) then | call mpi_send(iyax,1,mpi_integer,master,results_tag, & |mpi_comm_world,mpierror) | call mpi_send(stokes_buf_y,nz*8,mpi_double_precision, & |master,results_tag,mpi_comm_world,mpierror) | end if | | !omp barrier | | end do | !$OMP END PARALLEL ` Following Gilles' suggestion, I also tried changing MPI_THREAD_FUNELLED to MPI_THREAD_MULTIPLE just in case, but I get the same segmentation fault in the same line (mind you, the segmentation fault doesn't happen all the time). But again, no issues if running with --bind-to socket (and no apparent issues at all in the other computer even with --bind-to none). Many thanks for any suggestions, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none
Hello, "Keller, Rainer" writes: > You’re using MPI_Probe() with Threads; that’s not safe. > Please consider using MPI_Mprobe() together with MPI_Mrecv(). many thanks for the suggestion. I will try with the M variants, though I was under the impression that mpi_probe() was OK as far as one made sure that the source and tag matched between the mpi_probe() and the mpi_recv() calls. As you can see below, I'm careful with that (in any case I'm not sure the problems lies there, since the error I get is about invalid reference in the mpi_probe call itself). , |tid = 0 | #ifdef _OPENMP |tid = omp_get_thread_num() | #endif | |do | if (tid == 0) then | call mpi_send(my_rank, 1, mpi_integer, master, ask_job, & | mpi_comm_world, mpierror) | call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror) | | if (stat(mpi_tag) == stop_signal) then | call mpi_recv(b_,1,mpi_integer,master,stop_signal, & | mpi_comm_world,stat,mpierror) | else | call mpi_recv(iyax,1,mpi_integer,master,give_job, & | mpi_comm_world,stat,mpierror) | end if | end if | | !$omp barrier | | [... actual work...] ` > So getting into valgrind may be of help, possibly recompiling Open MPI > enabling valgrind-checking together with debugging options. I was hoping to avoid this route, but it certainly is looking like I'll have to bite the bullet... Thanks, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none
Hello, thanks for your help and suggestions. At the end it was no issue with OpenMPI or with any other system stuff, but rather a single line in our code. I thought I was doing the tests with the -fbounds-check option, but it turns out I was not, arrrghh!! At some point I was writing outside one of our arrays, and you can imagine the rest... That it was happening only when I was running with '--bind-to none' and only in my workstation brought me down all the wrong debugging paths. Once I realized -fbounds-check was not being used, figuring out the issue was a matter of seconds... Our code is now happily performing the +3000 tests without a hitch. Cheers, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ - AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
[OMPI users] Location of the file pmix-mca-params.conf?
Hello, with our current setting of OpenMPI and Slurm in a Ubuntu 22.04 server, when we submit MPI jobs I get the message: PMIX ERROR: ERROR in file ../../../../../../src/mca/gds/ds12/gds_ds12_lock_pthread.c at line 169 Following https://github.com/open-mpi/ompi/issues/7516, I tried setting PMIX_MCA_gds=hash. Setting it as an environment variable or setting it in the file ~/.pmix/mca-params.conf does work. I now want to set it system-wide, but I cannot find which file that should be. I have tried: + /etc/pmix-mca-params.conf + /usr/lib/x86_64-linux-gnu/pmix2/etc/pmix-mca.params.conf but no luck. Do you know which file is the system-wide configuration or how to find it? Thanks, -- Ángel de Vicente -- (GPG: 0x64D9FDAE7CD5E939) Research Software Engineer (Supercomputing and BigData) Instituto de Astrofísica de Canarias (https://www.iac.es/en)
Re: [OMPI users] Location of the file pmix-mca-params.conf?
Hi, Angel de Vicente via users writes: > I have tried: > + /etc/pmix-mca-params.conf > + /usr/lib/x86_64-linux-gnu/pmix2/etc/pmix-mca.params.conf > but no luck. Never mind, /etc/openmpi/pmix-mca-params.conf was the right one. Cheers, -- Ángel de Vicente -- (GPG: 0x64D9FDAE7CD5E939) Research Software Engineer (Supercomputing and BigData) Instituto de Astrofísica de Canarias (https://www.iac.es/en)