On 6/7/12 5:32 PM, Jeff Squyres wrote:
Check to ensure that you have firewalls disabled between your two machines; 
that's a common cause of hanging (i.e., Open MPI is trying to open connections 
and/or send data between your two nodes, and the packets are getting 
black-holed at the other side).

Open MPI needs to be able to communicate on random TCP ports between all 
machines that will be used in MPI jobs.

Thanks!!! After switching iptables off on all the machines, I got it working:

[mpiuser@fantomfs40a ~]$ mpirun -np 8 --machinefile /home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a
Hello world!  I am process number: 2 on host hp430a
Hello world!  I am process number: 3 on host hp430a
Hello world!  I am process number: 4 on host hp430a
Hello world!  I am process number: 5 on host hp430a
Hello world!  I am process number: 6 on host hp430b
Hello world!  I am process number: 7 on host hp430b

Thanks so much for all the answers/suggestions. I am excited now :).

D.



On Jun 7, 2012, at 3:06 AM, Duke wrote:

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried --debug-daemon and 
got:

[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello
Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a
<stuck here>

Somehow the program got stuck when checking on hosts. The secure log on hp430a 
showed that mpiuser logged in just fine:

tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session opened for 
user mpiuser by (uid=0)

Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:
Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:
Hello Duke,
Welcome to the forum.

The way openmpi schedules by default is to fill all the slots in a host, before 
moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling
Thanks for quick answer. I checked the FAQ, and tried with processes more than 
2, but somehow it got stalled:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello
^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b ./test/mpihello


My configuration must be wrong somewhere. Anyidea how I can check the system?

Thanks,

D.



--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke<duke.li...@gmx.com>  wrote:
Hi folks,

Please be gentle to the newest member of openMPI, I am totally new to this 
field. I just built a test cluster with 3 boxes on Scientific Linux 6.2 and 
openMPI (Open MPI 1.5.3), and I wanted to test how the cluster works but I cant 
figure out what was/is happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include<mpi.h>
#include<stdio.h>

int main(int argc, char **argv)
{
  int *buf, i, rank, nints, len;
  char hostname[256];

  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  gethostname(hostname,255);
  printf("Hello world!  I am process number: %d on host %s\n", rank, hostname);
  MPI_Finalize();
  return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile /home/mpiuser/.mpi_hostfile 
./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected to see 
somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list

us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list

us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to