On Jul 6, 2010, at 5:41 PM, Robert Walters wrote: > Thanks for your expeditious responses, Ralph. > > Just to confirm with you, I should change openmpi-mca-params.conf to include: > > oob_tcp_port_min_v4 = (My minimum port in the range) > oob_tcp_port_range_v4 = (My port range) > btl_tcp_port_min_v4 = (My minimum port in the range) > btl_tcp_port_range_v4 = (My port range) > > correct?
That should do ya. Use the same values on all nodes. You should be able to confirm that OMPI's run-time system is working if you are able to mpirun a non-MPI program like "hostname" or somesuch. If that works, then the daemons are launching, talking to each other, launching the app, shuttling the I/O around, noticing that the app is dying, tidying everything up, and telling mpirun that everything is done. In short: lots of things are happening right if you're able to mpirun "hostname" across multiple hosts. > Also, for a cluster of around 32-64 processes (8 processors per node), how > wide of a range will I require? I've noticed some entries in the mailing list > suggesting you need a few to get started and then it opens as necessary. Will > I be safe with 20 or should I go for 100? If you have 64 hosts, each with 8 processors, meaning that the largest MPI job you would run would be 64 * 8 = 512 MPI processes, then I'd ask for at least 1024 -- 2048 would be better (you have a zillion ports; better to ask for more than you need). We recently found a bug in the TCP BTL where it *may* use 2 sockets for each peerwise connection in some cases. Additionally, your sysadmin *might* be more amenable to opening up ports *only between the cluster nodes* (vs. opening up the ports to anything). If that's the case, you might as well go for the gold and ask them if they can open up *all* the ports between all your nodes (while still rejecting everything from non-cluster nodes). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/