Hi Daniel --
PSM should determine your node setup and enable shared contexts
accordingly, but it looks like something isn't working right. You
can apply the patch I've attached to this e-mail and things should
work again.
However, it would be useful to identify what's going wrong. Can
you compile a hello world program and run it with the machinefile
you're trying to use. Send me the output from:
mpirun -machinefile .... env PSM_TRACEMASK=0x101 ./hello_world
I understand your failure mode only if somehow the 8-core node is
detected to be a 4-core node. The output should tell us this.
cheers,
. . christian
On Wed, 06 Feb 2008, Dani?l Mantione wrote:
> Hello,
>
> I am trying to use OpenMPI on a cluster with Infinipath and 8 core nodes.
> I get these errors when using more than 4 processes:
>
> node017.13311ipath_userinit: assign_port command failed: Device or
> resource busy
> [node017:13311] Open MPI failed to open a PSM endpoint: No free InfiniPath
> contexts available on /dev/ipath
> [node017:13311] Error in psm_ep_open (error No free ports could be
> obtained)
> node017.13315ipath_userinit: assign_port command failed: Device or
> resource busy
> [node017:13315] Open MPI failed to open a PSM endpoint: No free InfiniPath
> contexts available on /dev/ipath
> [node017:13315] Error in psm_ep_open (error No free ports could be
> obtained)
> node017.13314ipath_userinit: assign_port command failed: Device or
> resource busy
> node017.13313ipath_userinit: assign_port command failed: Device or
> resource busy
> [node017:13313] Open MPI failed to open a PSM endpoint: No free InfiniPath
> contexts available on /dev/ipath
> [node017:13313] Error in psm_ep_open (error No free ports could be
> obtained)
> [node017:13314] Open MPI failed to open a PSM endpoint: No free InfiniPath
> contexts available on /dev/ipath
> [node017:13314] Error in psm_ep_open (error No free ports could be
> obtained)
>
> The Infinipath User Guide writes this:
>
> "Context Sharing Enabled: The MPI library provides PSM the local process
> layout
> so that InfiniPath contexts available on each node can be shared if
> necessary; for
> example, when running more node programs than contexts. By default, the
> QLE7140 and QHT7140 have a maximum of four and eight sharable InfiniPath
> contexts, respectively. Up to 4 node programs (from the same MPI job) can
> share
> an InfiniPath context, for a total of 16 node programs per node for each
> QLE7140
> and 32 node programs per node for each QHT7140.
> The error message when this limit is exceeded is:
>
> No free InfiniPath contexts available on /dev/ipath
> "
>
> It looks like OpenMPI is running into the context limit, apparently 4
> inthis case. Can I do the context sharing mentioned with OpenMPI?
>
> Best regards,
>
> Daniël Mantione
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
--
[email protected]
(QLogic Host Solutions Group, formerly Pathscale)
Index: ompi/mca/mtl/psm/mtl_psm.c
===================================================================
--- ompi/mca/mtl/psm/mtl_psm.c (revision 17385)
+++ ompi/mca/mtl/psm/mtl_psm.c (working copy)
@@ -102,6 +102,18 @@
}
+ /*
+ * Figure out how many procs are running on this host to handle context
+ * sharing corner cases.
+ */
+ if (orte_process_info.num_local_procs > 0) {
+ char buf[16];
+ snprintf(buf, sizeof buf - 1, "%d", orte_process_info.num_local_procs);
+ setenv("MPI_LOCALNRANKS", buf, 0);
+ snprintf(buf, sizeof buf - 1, "%d", orte_process_info.local_rank);
+ setenv("MPI_LOCALRANKID", buf, 0);
+ }
+
/* Handle our own errors for opening endpoints */
psm_error_register_handler(ompi_mtl_psm.ep, ompi_mtl_psm_errhandler);