Steffen Brinkmann wrote:
Hi!
I have installed OpenMPI on a cluster consisting of ~30 nodes with 16 Xeon
cores each. NFS is set up and working. For testing I have installed locally with
./configure --prefix=/home_dir/openmpi-1.4.3_installation/; make all install
everything smooth so far.
When I run a parallel program with
/home_dir/openmpi-1.4.3_installation/bin/mpirun -n 2 ./my_parprog
everything scales perfectly up to -n 16. When I go to -n 32, the execution time is the same as with -n 16.
/home_dir/openmpi-1.4.3_installation/bin/mpirun -n 32 hostname
returns 32 times the same node.
The program is fine (runs since years on several machines) and another mpi
installation scales well, so the cluster should be ok as well.
What did I do wrong???
Thanks for any hint!
Steffen
--
Dr. Steffen Brinkmann
High Performance Computing Center Stuttgart (HLRS)
Nobelstraße 19
D - 70569 Stuttgart
Germany
Phone: ++49(0)711 / 685-64548
Fax: ++49(0)711 / 685-65832
Hi Steffen
See this FAQ:
http://www.open-mpi.org/faq/?category=running#mpirun-host
If you have a resource manager, such as Torque or SGE,
you can build OpenMPI with support for it.
This will obviate the need to specify the nodes,
as the resource manager will take care of that for you:
http://www.open-mpi.org/faq/?category=building#build-rte-tm
http://www.open-mpi.org/faq/?category=building#build-rte-sge
BTW, the OpenMPI FAQ are the 'de facto' (and good)
OpenMPI documentation:
http://www.open-mpi.org/faq/
Other sources are the README file and the mpiexec man page.
I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------