Hi Gus, Thanks for your ideas.. I have a few questions, and will try to answer yours in hopes of solving this!!
Should I worry about setting things like --num-cores --bind-to-cores? This, I think, gets at your questions about processor affinity.. Am I right? I could not exactly figure out the -mca mpi-paffinity_alone stuff... 1. Additional load: nope. nothing else, most of the time not even firefox. 2. RAM: no problems apparent when monitoring through TOP. Interesting, I did wonder about oversubscription, so I tried the option --nooversubscription, but this gave me an error mssage. 3. I have not tried other MPI flavors.. Ive been speaking to the authors of the programs, and they are both using openMPI. 4. I don't think that this is a problem, as I'm specifying --with-mpi=/usr/bin/... when I compile the programs. Is there any other way to be sure that this is not a problem? 5. I had not been, and you could see some shuffling when monitoring the load on specific processors. I have tried to use --bind-to-cores to deal with this. I don't understand how to use the -mca options you asked about. 6. I am using Ubuntu 9.10. gcc 4.4.1 and g++ 4.4.1 MyBayes is a for bayesian phylogenetics: http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page ABySS: is a program for assembly of DNA sequence data: http://www.bcgsc.ca/platform/bioinfo/software/abyss > Do the programs mix MPI (message passing) with OpenMP (threads)? > Im honestly not sure what this means.. Thanks for all your help! Matt > Hi Matthew > More guesses/questions than anything else: > 1) Is there any additional load on this machine? > We had problems like that (on different machines) when > users start listening to streaming video, doing Matlab calculations, > etc, while the MPI programs are running. > This tends to oversubscribe the cores, and may lead to crashes. > 2) RAM: > Can you monitor the RAM usage through "top"? > (I presume you are on Linux.) > It may show unexpected memory leaks, if they exist. > On "top", type "1" (one) see all cores, type "f" then "j" > to see the core number associated to each process. > 3) Do the programs work right with other MPI flavors (e.g. MPICH2)? > If not, then it is not OpenMPI's fault. > 4) Any possibility that the MPI versions/flavors of mpicc and > mpirun that you are using to compile and launch the program are not the > same? > 5) Are you setting processor affinity on mpiexec? > mpiexec -mca mpi_paffinity_alone 1 -np ... bla, bla ... > Context switching across the cores may also cause trouble, I suppose. > 6) Which Linux are you using (uname -a)? > On other mailing lists I read reports that only quite recent kernels > support all the Intel Nehalem processor features well. > I don't have Nehalem, I can't help here, > but the information may be useful > for other list subscribers to help you. > *** > As for the programs, some programs require specific setup, > (and even specific compilation) when the number of MPI processes > vary. > It may help if you tell us a link to the program sites. > Baysian statistics is not totally out of our business, > but phylogenetic genetic trees is not really my league, > hence forgive me any bad guesses, please, > but would it need specific compilation or a different > set of input parameters to run correctly on a different > number of processors? > Do the programs mix MPI (message passing) with OpenMP (threads)? > I found this MrBayes, which seems to do the above: > http://mrbayes.csit.fsu.edu/ > http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page > As for the ABySS, what is it, where can it be found? > Doesn't look like a deep ocean circulation model, as the name suggest. > My $0.02 > Gus Correa