Hi again, Ok - I see what I missed in the FAQ, sorry about that....my understanding of the shell is a bit minimal to say the least. I now have my .bashrc files configured as such on both computers:
export LD_LIBRARY_PATH=/opt/local/openmpi/lib:{$PATH} export PATH=/opt/local/openmpi/bin:{$PATH} However, I am now running into a new issue that is still cryptic to me: quadcore:~ chrisjones$ /opt/local/openmpi/bin/mpirun -np 8 -hostfile hostfile ./ring_c Process 0 sending 10 to 1, tag 201 (8 processes in ring) [quadcore.mikrob.slu.se][[53435,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 127.0.0.1 failed: Connection refused (61) This may be superfluous, but I can connect to the localhost (ssh localhost) with no password prompt...Is there an ssh port I need to change somewhere? Again, thanks for your patience and help. Chris Have you setup your shell startup files such that they point to the new OMPI installation (/opt/local/openmpi/) even for non-interactive logins? Hi, Thanks for the quick response.....I managed to compile 1.5.3 on both computers using gcc-4.2, with the proper flags set (this took a bit of playing with, but I did eventually get it to compile). Once that was done, I installed it to a different directory from 1.2.8 (/opt/local/openmpi/), specified the PATH and LD_LIBRARY_PATH for the new version on each computer, then managed to get the hello_world script to run again so it could call each process, like before. However, I'm still in the same place - ring_c freezes up. I tried changing the hostname in the host file (just for poops and giggles - I see the response stating it doesn't matter), but to no avail. I made sure the firewall is off on both computers. I'm hoping I'm not doing something overly dumb here, but I'm still a bit stuck...I see in the FAQ that there were some issues with nehalem processors - I have two Xeons in one box and a nehalem in another. Could this make any difference? Thanks again, Chris On Aug 9, 2011, at 6:50 PM, Jeff Squyres wrote: No, Open MPI doesn't use the names in the hostfile to figure out which TCP/IP addresses to use (for example). Each process ends up publishing a list of IP addresses at which it can be connected, and OMPI does routability computations to figure out which is the "best" address to contact a given peer on. If you're just starting with Open MPI, can you upgrade? 1.2.8 is pretty ancient. Open MPI 1.4.3 is the most recent stable release; 1.5.3 is our "feature" series, but it's also relatively stable (new releases are coming in both the 1.4.x and 1.5.x series soon, FWIW). On Aug 9, 2011, at 12:14 PM, David Warren wrote: I don't know if this is it, but if you use the name localhost, won't processes on both machines try to talk to 127.0.0.1? I believe you need to use the real hostname in you host file. I think that your two tests work because there is no interprocess communication, just stdout. On 08/08/11 23:46, Christopher Jones wrote: Hi again, I changed the subject of my previous posting to reflect a new problem encountered when I changed my strategy to using SSH instead of Xgrid on two mac pros. I've set up a login-less ssh communication between the two macs (connected via direct ethernet, both running openmpi 1.2.8 on OSX 10.6.8) per the instructions on the FAQ. I can type in 'ssh computer-name.local' on either computer and connect without a password prompt. From what I can see, the ssh-agent is up and running - the following is listed in my ENV: SSH_AUTH_SOCK=/tmp/launch-5FoCc1/Listeners SSH_AGENT_PID=61058 My host file simply lists 'localhost' and 'chrisjones2@allana-welshs-mac-pro.local<mailto:'chrisjones2@allana-welshs-mac-pro.local>'. When I run a simple hello_world test, I get what seems like a reasonable output: chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile ./test_hello Hello world from process 0 of 8 Hello world from process 1 of 8 Hello world from process 2 of 8 Hello world from process 3 of 8 Hello world from process 4 of 8 Hello world from process 7 of 8 Hello world from process 5 of 8 Hello world from process 6 of 8 I can also run hostname and get what seems to be an ok response (unless I'm wrong about this): chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile hostname allana-welshs-mac-pro.local allana-welshs-mac-pro.local allana-welshs-mac-pro.local allana-welshs-mac-pro.local quadcore.mikrob.slu.se<http://quadcore.mikrob.slu.se> quadcore.mikrob.slu.se<http://quadcore.mikrob.slu.se> quadcore.mikrob.slu.se<http://quadcore.mikrob.slu.se> quadcore.mikrob.slu.se<http://quadcore.mikrob.slu.se> However, when I run the ring_c test, it freezes: chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile ./ring_c Process 0 sending 10 to 1, tag 201 (8 processes in ring) Process 0 sent to 1 Process 0 decremented value: 9 (I noted that processors on both computers are active). ring_c was compiled separately on each computer, however both have the same version of openmpi and OSX. I've gone through the FAQ and searched the user forum, but I can't quite seems to get this problem unstuck. Many thanks for your time, Chris On Aug 5, 2011, at 6:00 PM,<users-requ...@open-mpi.org<mailto:users-requ...@open-mpi.org>> <users-requ...@open-mpi.org<mailto:users-requ...@open-mpi.org>> wrote: