Hi Daniel -- PSM should determine your node setup and enable shared contexts accordingly, but it looks like something isn't working right. You can apply the patch I've attached to this e-mail and things should work again. However, it would be useful to identify what's going wrong. Can you compile a hello world program and run it with the machinefile you're trying to use. Send me the output from:
mpirun -machinefile .... env PSM_TRACEMASK=0x101 ./hello_world I understand your failure mode only if somehow the 8-core node is detected to be a 4-core node. The output should tell us this. cheers, . . christian On Wed, 06 Feb 2008, Dani?l Mantione wrote: > Hello, > > I am trying to use OpenMPI on a cluster with Infinipath and 8 core nodes. > I get these errors when using more than 4 processes: > > node017.13311ipath_userinit: assign_port command failed: Device or > resource busy > [node017:13311] Open MPI failed to open a PSM endpoint: No free InfiniPath > contexts available on /dev/ipath > [node017:13311] Error in psm_ep_open (error No free ports could be > obtained) > node017.13315ipath_userinit: assign_port command failed: Device or > resource busy > [node017:13315] Open MPI failed to open a PSM endpoint: No free InfiniPath > contexts available on /dev/ipath > [node017:13315] Error in psm_ep_open (error No free ports could be > obtained) > node017.13314ipath_userinit: assign_port command failed: Device or > resource busy > node017.13313ipath_userinit: assign_port command failed: Device or > resource busy > [node017:13313] Open MPI failed to open a PSM endpoint: No free InfiniPath > contexts available on /dev/ipath > [node017:13313] Error in psm_ep_open (error No free ports could be > obtained) > [node017:13314] Open MPI failed to open a PSM endpoint: No free InfiniPath > contexts available on /dev/ipath > [node017:13314] Error in psm_ep_open (error No free ports could be > obtained) > > The Infinipath User Guide writes this: > > "Context Sharing Enabled: The MPI library provides PSM the local process > layout > so that InfiniPath contexts available on each node can be shared if > necessary; for > example, when running more node programs than contexts. By default, the > QLE7140 and QHT7140 have a maximum of four and eight sharable InfiniPath > contexts, respectively. Up to 4 node programs (from the same MPI job) can > share > an InfiniPath context, for a total of 16 node programs per node for each > QLE7140 > and 32 node programs per node for each QHT7140. > The error message when this limit is exceeded is: > > No free InfiniPath contexts available on /dev/ipath > " > > It looks like OpenMPI is running into the context limit, apparently 4 > inthis case. Can I do the context sharing mentioned with OpenMPI? > > Best regards, > > Daniël Mantione > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- christian.b...@qlogic.com (QLogic Host Solutions Group, formerly Pathscale)
Index: ompi/mca/mtl/psm/mtl_psm.c =================================================================== --- ompi/mca/mtl/psm/mtl_psm.c (revision 17385) +++ ompi/mca/mtl/psm/mtl_psm.c (working copy) @@ -102,6 +102,18 @@ } + /* + * Figure out how many procs are running on this host to handle context + * sharing corner cases. + */ + if (orte_process_info.num_local_procs > 0) { + char buf[16]; + snprintf(buf, sizeof buf - 1, "%d", orte_process_info.num_local_procs); + setenv("MPI_LOCALNRANKS", buf, 0); + snprintf(buf, sizeof buf - 1, "%d", orte_process_info.local_rank); + setenv("MPI_LOCALRANKID", buf, 0); + } + /* Handle our own errors for opening endpoints */ psm_error_register_handler(ompi_mtl_psm.ep, ompi_mtl_psm_errhandler);