Afraid I have no idea how those packages were built, what release they correspond to, etc. I would suggest sticking with the tarballs.
Your output indicates a problem with shared memory when you completely fill the machine. Could be a couple of things, like running out of memory - but for now, try adding -mca btl ^sm to your cmd line. Should work. On Apr 13, 2012, at 5:09 AM, Seyyed Mohtadin Hashemi wrote: > Hi, > > Sorry that it took so long to answer, I didn't get any return mails and had > to check the digest for reply. > > Anyway, when i compiled from scratch then i did use the tarballs from > open-mpi.org. GROMACS is not the problem (or at least i don't think so), i > just used it as a check to see if i could run parallel jobs - i am now using > OSU benchmarks because i can't be sure that the problem is not with GROMACS. > > On the new installation i have not installed (nor compiled) OMPI from the > official tarballs but rather installed the "openmpi-bin, openmpi-common, > libopenmpi1.3, openmpi-checkpoint, and libopenmpi-dev" packages using apt-get. > > As for the simple examples (i.e. ring_c, hello_c, and connectivity_c > extracted from the 1.4.2 official tarball) i get the exact same behavior as > with GROMACS/OSU bench. > > I suspect you'll have to ask someone familiar with GROMACS about that > specific package. As for testing OMPI, can you run the codes in the examples > directory - e.g., "hello" and "ring"? I assume you are downloading and > installing OMPI from our tarballs? > > On Apr 12, 2012, at 7:04 AM, Seyyed Mohtadin Hashemi wrote: > > > Hello, > > > > I have a very peculiar problem: I have a micro cluster with three nodes (18 > > cores total); the nodes are clones of each other and connected to a > > frontend via Ethernet and Debian squeeze as the OS for all nodes. When I > > run parallel jobs I can used up ?-np 10? if I go further the job crashes, I > > have primarily done tests with GROMACS (because that is what I will be > > running) but have also used OSU Micro-Benchmarks 3.5.2. > > > > For a simple parallel job I use: ?path/mpirun ?hostfile path/hostfile ?np > > XX ?d ?display-map path/mdrun_mpi ?s path/topol.tpr ?o path/output.trr? > > > > (path is global) For ?np XX being smaller than or 10 it works, however as > > soon as I make use of 11 or larger the whole thing crashes. The terminal > > dump is attached to this mail: when_working.txt is for ??np 10?, > > when_crash.txt is for ??np 12?, and OpenMPI_info.txt is output from > > ?path/mpirun --bynode --hostfile path/hostfile --tag-output ompi_info -v > > ompi full ?parsable? > > > > I have tried OpenMPI v.1.4.2 all the way up to beta v1.5.5, and all yield > > the same result. > > > > The output files are from a new install I did today: I formatted all nodes > > and started from a fresh minimal install of Squeeze and used "apt-get > > install gromacs gromacs-openmpi" and installed all dependencies. Then I ran > > two jobs using the parameters described above, I also did one with OSU > > bench (data is not included) it also crashed with ?-np? larger than 10. > > > > I hope somebody can help figure out what is wrong and how I can fix it. > > > > Best regards, > > Mohtadin > > > > ***************************************************************************** > > ** ** > > ** WARNING: This email contains an attachment of a very suspicious type. ** > > ** You are urged NOT to open this attachment unless you are absolutely ** > > ** sure it is legitimate. Opening this attachment may cause irreparable ** > > ** damage to your computer and your files. If you have any questions ** > > ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. > > ** > > ** ** > > ** This warning was added by the IU Computer Science Dept. mail scanner. ** > > ***************************************************************************** > > > > <Archive.zip>_______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users