Hello everyone, I am attempting to run a single program on 32 cores split across 4 computers (So each computer has 8 cores). I am attempting to use mpich for this. I currently am just testing on 2 computers, I have the program installed on both, as well as mpich installed on both. I have created a register key and can login in using ssh into the other computer without a password. I have come across 2 problems. One, when I attempt to connect using the mpirun -np 3 --host a (the IP of the computer I am attempting to connect to) hostname I recieve the error unable to connect from "localhost.localdomain" to "localhost.localdomain"
This is indicating my computers "localhost.localdomain" is attempting to connect to another "localhost.localdomain". How can I change this so that it connects via my IP to the other computers IP? Secondly, I attempted to use a host file instead using the hydra process wiki. I created a hosts file with just the IP of the computer I am attempting to connect to. When I type in the command mpiexec -f hosts -n 4 ./applic I get this error [mpiexec@localhost.localdomain] HYDU_parse_hostfile (./utils/args/args.c:323): unable to open host file: hosts along with other errors of unable to parse hostfile, match handler etc. I assume this is all due to it being unable to read the host file. Is there any specific place I should save my hosts file? I have it saved directly on my Desktop. I have attempted to indicate the full path where it is located, but I still get the same error. For the first problem, I have read that I need to change /etc/hosts manually by using the sudo command to manually enter the IP of the computer I am attempting to connect to in the /etc/hosts file. Thank you in advance. Sincerely, Sam
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users