I have recently installed openmpi 1.3r1212a over tcp and gigabit on a Solaris 10 x86/64 system.
The compilation of some test codes monte (a monte carlo estimate of pi), connectivity which test connectivity between processes and nodes prime, which calculates prime numbers (these testcode are examples which are bundled with Sun HPC). compile fine using the openmpi version of mpicc, mpif95 and mpic++ And sometimes the jobs work fine, but most of the time the jobs freeze leaving zombies behind. my run time command is mpirun --hostfile my-hosts -mca pls_rsh_agent rsh --mca btl tcp,self -np 14 \ monte and I get as output oberon(209) > mpirun --hostfile my-hosts -mca pls_rsh_agent rsh --mca btl tcp,self -np 14 monte Monte-Carlo estimate of pi by 14 processes is 3.141503. with the cursor hanging. The process table shows oberon# ps -eaf | grep dph0elh dph0elh 9583 7445 7 17:45:01 pts/26 9:22 mpirun --hostfile my-hosts -mca pls_rsh_agent rsh --mca btl tcp,self -np 14 mon dph0elh 9595 9588 0 - ? 0:02 <defunct> dph0elh 9588 1 7 17:45:01 ?? 9:03 orted --bootproxy 1 --name 0.0.1 --num_procs 5 --vpid_start 0 --nodename oberon dph0elh 7445 6924 0 17:01:38 pts/26 0:00 -tcsh root 9656 4151 0 18:01:31 pts/36 0:00 grep dph0elh dph0elh 9593 9588 0 - ? 0:02 <defunct> one of the nodes offers 8 cpus the other nodes in the hostfile offer 2. There are a total of 14 cpus available. and as you can see from the command line I use --mca btl tcp,self There are no other interconnects. I could not find any entry in the FAQs, except for the advice on using --mca btl tcp,self. ------------------------------------------ Dr E L Heck University of Durham Institute for Computational Cosmology Ogden Centre Department of Physics South Road DURHAM, DH1 3LE United Kingdom e-mail: lydia.h...@durham.ac.uk Tel.: + 44 191 - 334 3628 Fax.: + 44 191 - 334 3645 ___________________________________________