I have what is probably a simple question (I hope). I have built openmpi-1.1.1 from source using gfortran on Mac OS X 10.4.7. I can run parallel jobs on my own using the mpiexec -np command. My machinefile contains the lines:

tachyon.a04.aist.go.jp
tachyon.a04.aist.go.jp
gehirn.local
gehirn.local

(the .local uses zeroconfig to find the address of gehirn -- it works). Running a parallel job on my own machine (-np 2) everything is fine. The job runs in parallel; it is faster and the output is correct. When I try running with -np 4 to use an additional g5 dual cpu machine, my job hangs whilst churning large amounts of cpu (runaway processes). This continues without output until I break the process with a ^C (which terminates them on all machines). I am running the task via ssh using a ssh-agent. Might anyone have any idea what possibly could be wrong. I have attached my config.log and ompi_info files (bzip2'ed) to this mail as specified in the mailing list instructions. This should be a simple thing I am guessing, but it is taking too much time to figure it out on my own (e.g. I couldn't find a FAQ or a user question/reply that answered this).

                                        Paul Fons

Attachment: config.log.bz2
Description: Binary data

Attachment: ompi_info.log.bz2
Description: Binary data



Script started on Tue Sep  5 16:01:18 2006
[tachyon:exafs/feff85/zno] paulfons% mpiexec -machinefile machinefile -np 2 host name

tachyon.a04.aist.go.jp
tachyon.a04.aist.go.jp
[tachyon:exafs/feff85/zno] paulfons% mpiexec -machinefile machinefile -np 2 /opt/feff/feff85/rdinp

Number of processors =            2
Feff 8.40
  XANES:
name:     zincite ZnO
formula:  ZnO
sites:    Zn1,O1
refer1:   wyckoff, vol 1, ch III, p 111
refer2:
schoen:
notes1:
[tachyon:exafs/feff85/zno] paulfons% mpiexec -machinefile machinefile -np 2 hostname

tachyon.a04.aist.go.jp
tachyon.a04.aist.go.jp
dhcp054092.a04.aist.go.jp
dhcp054092.a04.aist.go.jp
[tachyon:exafs/feff85/zno] paulfons% mpiexec -machinefile machinefile -np 4 /opt/feff/feff85/rdinp

Number of processors =            4
Feff 8.40
  XANES:
name:     zincite ZnO
formula:  ZnO
sites:    Zn1,O1
refer1:   wyckoff, vol 1, ch III, p 111
refer2:
schoen:
notes1:


^Cmpiexec: killing job...





Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to