Just a quick interjection, I also have a dual-quad Nehalem system, HT on, 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads --enable-mpi-f77=no --with-openib=no
With v1.3.4 I see roughly the same behavior, hello, ring work, connectivity fails randomly with np >= 8. Turning on -v increased the success, but still hangs. np = 16 fails more often, and the hang is random in which pair of processes are communicating. However, it seems to be related to the shared memory layer problem. Running with -mca btl ^sm works consistently through np = 128. Hope this helps. Mark On Wed, Dec 9, 2009 at 8:03 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Matthew > > Save any misinterpretation I may have made of the code: > > Hello_c has no real communication, except for a final Barrier > synchronization. > Each process prints "hello world" and that's it. > > Ring probes a little more, with processes Send(ing) and > Recv(cieving) messages. > Ring just passes a message sequentially along all process > ranks, then back to rank 0, and repeat the game 10 times. > Rank 0 is in charge of counting turns, decrementing the counter, > and printing that (nobody else prints). > With 4 processes: > 0->1->2->3->0->1... 10 times > > In connectivity every pair of processes exchange a message. > Therefore it probes all pairwise connections. > In verbose mode you can see that. > > These programs shouldn't hang at all, if the system were sane. > Actually, they should even run with a significant level of > oversubscription, say, > -np 128 should work easily for all three programs on a powerful > machine like yours. > > > ** > > Suggestions > > 1) Stick to the OpenMPI you compiled. > > ** > > 2) You can run connectivity_c in verbose mode: > > home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c -v > > (Note the trailing "-v".) > > It should tell more about who's talking to who. > > ** > > 3) I wonder if there are any BIOS settings that may be required > (and perhaps not in place) to make the Nehalem hyperthreading to > work properly in your computer. > > You reach the BIOS settings by typing <DEL> or <F2> > when the computer boots up. > The key varies by > BIOS and computer vendor, but shows quickly on the bootup screen. > > You may ask the computer vendor about the recommended BIOS settings. > If you haven't done this before, be careful to change and save only > what really needs to change (if anything really needs to change), > or the result may be worse. > (Overclocking is for gamers, not for genome researchers ... :) ) > > ** > > 4) What I read about Nehalem DDR3 memory is that it is optimal > on configurations that are multiples of 3GB per CPU. > Common configs. in dual CPU machines like yours are > 6, 12, 24 and 48GB. > The sockets where you install the memory modules also matter. > > Your computer has 20GB. > Did you build the computer or upgrade the memory yourself? > Do you know how the memory is installed, in which memory sockets? > What does the vendor have to say about it? > > See this: > > http://en.community.dell.com/blogs/dell_tech_center/archive/2009/04/08/nehalem-and-memory-configurations.aspx > > ** > > 5) As I said before, typing "f" then "j" on "top" will add > a column (labeled "P") that shows in which core each process is running. > This will let you observe how the Linux scheduler is distributing > the MPI load across the cores. > Hopefully it is load-balanced, and different processes go to different > cores. > > *** > > It is very disconcerting when MPI processes hang. > You are not alone. > The reasons are not always obvious. > At least in your case there is no network involved or to troubleshoot. > > > ** > > I hope it helps, > > Gus Correa > --------------------------------------------------------------------- > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > > > > > > Matthew MacManes wrote: > >> Hi Gus and List, >> >> 1st of all Gus, I want to say thanks.. you have been a huge help, and when >> I get this fixed, I owe you big time! >> >> However, the problems continue... >> >> I formatted the HD, reinstalled OS to make sure that I was working from >> scratch. I did your step A, which seemed to go fine: >> >> macmanes@macmanes:~$ which mpicc >> /home/macmanes/apps/openmpi1.4/bin/mpicc >> macmanes@macmanes:~$ which mpirun >> /home/macmanes/apps/openmpi1.4/bin/mpirun >> >> Good stuff there... >> >> I then compiled the example files: >> >> macmanes@macmanes:~/Downloads/openmpi-1.4/examples$ >> /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 ring_c >> Process 0 sending 10 to 1, tag 201 (8 processes in ring) >> Process 0 sent to 1 >> Process 0 decremented value: 9 >> Process 0 decremented value: 8 >> Process 0 decremented value: 7 >> Process 0 decremented value: 6 >> Process 0 decremented value: 5 >> Process 0 decremented value: 4 >> Process 0 decremented value: 3 >> Process 0 decremented value: 2 >> Process 0 decremented value: 1 >> Process 0 decremented value: 0 >> Process 0 exiting >> Process 1 exiting >> Process 2 exiting >> Process 3 exiting >> Process 4 exiting >> Process 5 exiting >> Process 6 exiting >> Process 7 exiting >> macmanes@macmanes:~/Downloads/openmpi-1.4/examples$ >> /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c >> Connectivity test on 8 processes PASSED. >> macmanes@macmanes:~/Downloads/openmpi-1.4/examples$ >> /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c >> ..HANGS..NO OUTPUT >> >> this is maddening because ring_c works.. and connectivity_c worked the 1st >> time, but not the second... I did it 10 times, and it worked twice.. here is >> the TOP screenshot: >> >> >> http://picasaweb.google.com/macmanes/DropBox?authkey=Gv1sRgCLKokNOVqo7BYw#5413382182027669394 >> >> What is the difference between connectivity_c and ring_c? Under what >> circumstances should one fail and not the other... >> >> I'm off to the Linux forums to see about the Nehalem kernel issues.. >> >> Matt >> >> >> >> On Wed, Dec 9, 2009 at 13:25, Gus Correa <g...@ldeo.columbia.edu <mailto: >> g...@ldeo.columbia.edu>> wrote: >> >> Hi Matthew >> >> There is no point in trying to troubleshoot MrBayes and ABySS >> if not even the OpenMPI test programs run properly. >> You must straighten them out first. >> >> ** >> >> Suggestions: >> >> ** >> >> A) While you are at OpenMPI, do yourself a favor, >> and install it from source on a separate directory. >> Who knows if the OpenMPI package distributed with Ubuntu >> works right on Nehalem? >> Better install OpenMPI yourself from source code. >> It is not a big deal, and may save you further trouble. >> >> Recipe: >> >> 1) Install gfortran and g++ if you don't have them using apt-get. >> 2) Put the OpenMPI tarball in, say /home/matt/downolads/openmpi >> 3) Make another install directory *not in the system directory tree*. >> Something like "mkdir /home/matt/apps/openmpi-X.Y.Z/" (X.Y.Z=version) >> will work >> 4) cd /home/matt/downolads/openmpi >> 5) ./configure CC=gcc CXX=g++ F77=gfortran FC=gfortran \ >> --prefix=/home/matt/apps/openmpi-X.Y.Z >> (Use the prefix flag to install in the directory of item 3.) >> 6) make >> 7) make install >> 8) At the bottom of your /home/matt/.bashrc or .profile file >> put these lines: >> >> export PATH=/home/matt/apps/openmpi-X.Y.Z/bin:${PATH} >> export MANPATH=/home/matt/apps/openmpi-X.Y.Z/share/man:`man -w` >> export >> LD_LIBRARY_PATH=home/matt/apps/openmpi-X.Y.Z/lib:${LD_LIBRARY_PATH} >> >> (If you use csh/tcsh use instead: >> setenv PATH /home/matt/apps/openmpi-X.Y.Z/bin:${PATH} >> etc) >> >> 9) Logout and login again to freshen um the environment variables. >> 10) Do "which mpicc" to check that it is pointing to your newly >> installed OpenMPI. >> 11) Recompile and rerun the OpenMPI test programs >> with 2, 4, 8, 16, .... processors. >> Use full path names to mpicc and to mpirun, >> if the change of PATH above doesn't work right. >> >> ******** >> >> B) Nehalem is quite new hardware. >> I don't know if the Ubuntu kernel 2.6.31-16 fully supports all >> of Nehalem features, particularly hyperthreading, and NUMA, >> which are used by MPI programs. >> I am not the right person to give you advice about this. >> I googled out but couldn't find a clear information about >> minimal kernel age/requirements to have Nehalem fully supported. >> Some Nehalem owner in the list could come forward and tell. >> >> ** >> >> C) On the top screenshot you sent me, please try it again >> (after you do item A) but type "f" and "j" to show the processors >> that are running each process. >> >> ** >> >> D) Also, the screeshot shows 20GB of memory. >> This sounds not as a optimal memory for Nehalem, >> which tend to be 6GB, 12GB, 24GB, 48GB. >> Did you put together the system, or upgraded the memory yourself, >> of did you buy the computer as is? >> However, this should not break MPI anyway. >> >> ** >> >> E) Answering your question: >> It is true that different flavors of MPI >> used to compile (mpicc) and run (mpiexec) a program would probably >> break right away, regardless of the number of processes. >> However, when it comes to different versions of the >> same MPI flavor (say OpenMPI 1.3.4 and OpenMPI 1.3.3) >> I am not sure it will break. >> I would guess it may run but not in a reliable way. >> Problems may appear as you stress the system with more cores, etc. >> But this is just a guess. >> >> ** >> >> I hope this helps, >> >> Gus Correa >> --------------------------------------------------------------------- >> Gustavo Correa >> Lamont-Doherty Earth Observatory - Columbia University >> Palisades, NY, 10964-8000 - USA >> --------------------------------------------------------------------- >> >> >> Matthew MacManes wrote: >> >> Hi Gus, >> >> Interestingly the results for the connectivity_c test... works >> fine with -np <8. For -np >8 it works some of the time, other >> times it HANGS. I have got to believe that this is a big clue!! >> Also, when it hangs, sometimes I get the message "mpirun was >> unable to cleanly terminate the daemons on the nodes shown >> below" Note that NO nodes are shown below. Once, I got -np 250 >> to pass the connectivity test, but I was not able to replicate >> this reliable, so I'm not sure if it was a fluke, or what. Here >> is a like to a screenshop of TOP when connectivity_c is hung >> with -np 14.. I see that 2 processes are only at 50% CPU usage.. >> Hmmmm >> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink >> < >> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink >> > >> < >> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink >> < >> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink >> >> >> >> >> The other tests, ring_c, hello_c, as well as the cxx versions of >> these guys with with all values of -np. >> >> Using -mca mpi-paffinity_alone 1 I get the same behavior. >> I agree that I am should worry about the mismatch between where >> the libraries are installed versus where I am telling my >> programs to look for them. Would this type of mismatch cause >> behavior like what I am seeing, i.e. working with a small >> number of processors, but failing with larger? It seems like a >> mismatch would have the same effect regardless of the number of >> processors used. Maybe I am mistaken. Anyway, to address this, >> which mpirun gives me /usr/local/bin/mpirun.. so to configure >> ./configure --with-mpi=/usr/local/bin/mpirun and to run >> /usr/local/bin/mpirun -np X ... This should >> uname -a gives me: Linux macmanes 2.6.31-16-generic #52-Ubuntu >> SMP Thu Dec 3 22:07:16 UTC 2006 x86_64 GNU/Linux >> >> Matt >> >> On Dec 8, 2009, at 8:50 PM, Gus Correa wrote: >> >> Hi Matthew >> >> Please see comments/answers inline below. >> >> Matthew MacManes wrote: >> >> Hi Gus, Thanks for your ideas.. I have a few questions, >> and will try to answer yours in hopes of solving this!! >> >> >> A simple way to test OpenMPI on your system is to run the >> test programs that come with the OpenMPI source code, >> hello_c.c, connectivity_c.c, and ring_c.c: >> http://www.open-mpi.org/ >> >> Get the tarball from the OpenMPI site, gzip and untar it, >> and look for it in the "examples" directory. >> Compile it with /your/path/to/openmpi/bin/mpicc hello_c.c >> Run it with /your/path/to/openmpi/bin/mpiexec -np X a.out >> using X = 2, 4, 8, 16, 32, 64, ... >> >> This will tell if your OpenMPI is functional, >> and if you can run on many Nehalem cores, >> even with oversubscription perhaps. >> It will also set the stage for further investigation of your >> actual programs. >> >> >> Should I worry about setting things like --num-cores >> --bind-to-cores? This, I think, gets at your questions >> about processor affinity.. Am I right? I could not >> exactly figure out the -mca mpi-paffinity_alone stuff... >> >> >> I use the simple minded -mca mpi-paffinity_alone 1. >> This is probably the easiest way to assign a process to a core. >> There more complex ways in OpenMPI, but I haven't tried. >> Indeed, -mca mpi-paffinity_alone 1 does improve performance of >> our programs here. >> There is a chance that without it the 16 virtual cores of >> your Nehalem get confused with more than 3 processes >> (you reported that -np > 3 breaks). >> >> Did you try adding just -mca mpi-paffinity_alone 1 to >> your mpiexec command line? >> >> >> 1. Additional load: nope. nothing else, most of the time >> not even firefox. >> >> >> Good. >> Turn off firefox, etc, to make it even better. >> Ideally, use runlevel 3, no X, like a computer cluster node, >> but this may not be required. >> >> 2. RAM: no problems apparent when monitoring through >> TOP. Interesting, I did wonder about oversubscription, >> so I tried the option --nooversubscription, but this >> gave me an error mssage. >> >> >> Oversubscription from your program would only happen if >> you asked for more processes than available cores, i.e., >> -np > 8 (or "virtual" cores, in case of Nehalem hyperthreading, >> -np > 16). >> Since you have -np=4 there is no oversubscription, >> unless you have other external load (e.g. Matlab, etc), >> but you said you don't. >> >> Yet another possibility would be if your program is threaded >> (e.g. using OpenMP along with MPI), but considering what you >> said about OpenMP I would guess the programs don't use it. >> For instance, you launch the program with 4 MPI processes, >> and each process decides to start, say, 8 OpenMP threads. >> You end up with 32 threads and 8 (real) cores (or 16 >> hyperthreaded >> ones on Nehalem). >> >> >> What else does top say? >> Any hog processes (memory- or CPU-wise) >> besides your program processes? >> >> 3. I have not tried other MPI flavors.. Ive been >> speaking to the authors of the programs, and they are >> both using openMPI. >> >> I was not trying to convince you to use another MPI. >> I use MPICH2 also, but OpenMPI reigns here. >> The idea or trying it with MPICH2 was just to check whether >> OpenMPI >> is causing the problem, but I don't think it is. >> >> 4. I don't think that this is a problem, as I'm >> specifying --with-mpi=/usr/bin/... when I compile the >> programs. Is there any other way to be sure that this is >> not a problem? >> >> >> Hmmm .... >> I don't know about your Ubuntu (we have CentOS and Fedora on >> various >> machines). >> However, most Linux distributions come with their MPI flavors, >> and so do compilers, etc. >> Often times they install these goodies in unexpected places, >> and this has caused a lot of frustration. >> There are tons of postings on this list that eventually >> boiled down to mismatched versions of MPI in unexpected places. >> >> >> The easy way is to use full path names to compile and to run. >> Something like this: >> /my/openmpi/bin/mpicc on your program configuration script), >> >> and something like this >> /my/openmpi/bin/mpiexec -np ... bla, bla ... >> when you submit the job. >> >> You can check your version with "which mpicc", "which mpiexec", >> and (perhaps using full path names) with >> "ompi_info", "mpicc --showme", "mpiexec --help". >> >> >> 5. I had not been, and you could see some shuffling when >> monitoring the load on specific processors. I have tried >> to use --bind-to-cores to deal with this. I don't >> understand how to use the -mca options you asked about. >> 6. I am using Ubuntu 9.10. gcc 4.4.1 and g++ 4.4.1 >> >> >> I am afraid I won't be of help, because I don't have Nehalem. >> However, I read about Nehalem requiring quite recent kernels >> to get all of its features working right. >> >> What is the output of "uname -a"? >> This will tell the kernel version, etc. >> Other list subscribers may give you a suggestion if you post >> the >> information. >> >> MyBayes is a for bayesian phylogenetics: >> http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page >> ABySS: is a program for assembly of DNA sequence data: >> http://www.bcgsc.ca/platform/bioinfo/software/abyss >> >> >> Thanks for the links! >> I had found the MrBayes link. >> I eventually found what your ABySS was about, but no links. >> Amazing that it is about DNA/gene sequencing. >> Our abyss here is the deep ocean ... :) >> Abysmal difference! >> >> Do the programs mix MPI (message passing) with >> OpenMP (threads)? >> >> Im honestly not sure what this means.. >> >> >> Some programs mix the two. >> OpenMP only works in a shared memory environment (e.g. a single >> computer like yours), whereas MPI can use both shared memory >> and work across a network (e.g. in a cluster). >> There are other differences too. >> >> Unlikely that you have this hybrid type of parallel program, >> otherwise there would be some reference to OpenMP >> on the very program configuration files, program >> documentation, etc. >> Also, in general the configuration scripts of these hybrid >> programs can turn on MPI only, or OpenMP only, or both, >> depending on how you configure. >> >> Even to compile with OpenMP you would need a proper compiler >> flag, but that one might be hidden in a Makefile too, making >> a bit hard to find. "grep -n mp Makefile" may give a clue. >> Anything on the documentation that mentions threads or OpenMP? >> >> FYI, here is OpenMP: >> http://openmp.org/wp/ >> >> Thanks for all your help! >> >> > Matt >> >> Well, so far it didn't really help. :( >> >> But let's hope to find a clue, >> maybe with a little help of >> our list subscriber friends. >> >> Gus Correa >> >> --------------------------------------------------------------------- >> Gustavo Correa >> Lamont-Doherty Earth Observatory - Columbia University >> Palisades, NY, 10964-8000 - USA >> >> --------------------------------------------------------------------- >> >> Hi Matthew >> >> More guesses/questions than anything else: >> >> 1) Is there any additional load on this machine? >> We had problems like that (on different machines) when >> users start listening to streaming video, doing >> Matlab calculations, >> etc, while the MPI programs are running. >> This tends to oversubscribe the cores, and may lead >> to crashes. >> >> 2) RAM: >> Can you monitor the RAM usage through "top"? >> (I presume you are on Linux.) >> It may show unexpected memory leaks, if they exist. >> >> On "top", type "1" (one) see all cores, type "f" >> then "j" >> to see the core number associated to each process. >> >> 3) Do the programs work right with other MPI flavors >> (e.g. MPICH2)? >> If not, then it is not OpenMPI's fault. >> >> 4) Any possibility that the MPI versions/flavors of >> mpicc and >> mpirun that you are using to compile and launch the >> program are not the >> same? >> >> 5) Are you setting processor affinity on mpiexec? >> >> mpiexec -mca mpi_paffinity_alone 1 -np ... bla, bla ... >> >> Context switching across the cores may also cause >> trouble, I suppose. >> >> 6) Which Linux are you using (uname -a)? >> >> On other mailing lists I read reports that only >> quite recent kernels >> support all the Intel Nehalem processor features well. >> I don't have Nehalem, I can't help here, >> but the information may be useful >> for other list subscribers to help you. >> >> *** >> >> As for the programs, some programs require specific >> setup, >> (and even specific compilation) when the number of >> MPI processes >> vary. >> It may help if you tell us a link to the program sites. >> >> Baysian statistics is not totally out of our business, >> but phylogenetic genetic trees is not really my league, >> hence forgive me any bad guesses, please, >> but would it need specific compilation or a different >> set of input parameters to run correctly on a different >> number of processors? >> Do the programs mix MPI (message passing) with >> OpenMP (threads)? >> >> I found this MrBayes, which seems to do the above: >> >> http://mrbayes.csit.fsu.edu/ >> http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page >> >> As for the ABySS, what is it, where can it be found? >> Doesn't look like a deep ocean circulation model, as >> the name suggest. >> >> My $0.02 >> Gus Correa >> >> >> ------------------------------------------------------------------------ >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _________________________________ >> Matthew MacManes >> PhD Candidate >> University of California- Berkeley >> Museum of Vertebrate Zoology >> Phone: 510-495-5833 >> Lab Website: http://ib.berkeley.edu/labs/lacey >> Personal Website: http://macmanes.com/ >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >