Hi Daniel --

  PSM should determine your node setup and enable shared contexts
  accordingly, but it looks like something isn't working right.  You
  can apply the patch I've attached to this e-mail and things should
  work again.
  
  However, it would be useful to identify what's going wrong.  Can
  you compile a hello world program and run it with the machinefile
  you're trying to use.  Send me the output from:

  mpirun -machinefile .... env PSM_TRACEMASK=0x101 ./hello_world

  I understand your failure mode only if somehow the 8-core node is
  detected to be a 4-core node.  The output should tell us this.

  cheers,

    . . christian
  


On Wed, 06 Feb 2008, Dani?l Mantione wrote:

> Hello,
> 
> I am trying to use OpenMPI on a cluster with Infinipath and 8 core nodes. 
> I get these errors when using more than 4 processes:
> 
> node017.13311ipath_userinit: assign_port command failed: Device or 
> resource busy
> [node017:13311] Open MPI failed to open a PSM endpoint: No free InfiniPath 
> contexts available on /dev/ipath
> [node017:13311] Error in psm_ep_open (error No free ports could be 
> obtained)
> node017.13315ipath_userinit: assign_port command failed: Device or 
> resource busy
> [node017:13315] Open MPI failed to open a PSM endpoint: No free InfiniPath 
> contexts available on /dev/ipath
> [node017:13315] Error in psm_ep_open (error No free ports could be 
> obtained)
> node017.13314ipath_userinit: assign_port command failed: Device or 
> resource busy
> node017.13313ipath_userinit: assign_port command failed: Device or 
> resource busy
> [node017:13313] Open MPI failed to open a PSM endpoint: No free InfiniPath 
> contexts available on /dev/ipath
> [node017:13313] Error in psm_ep_open (error No free ports could be 
> obtained)
> [node017:13314] Open MPI failed to open a PSM endpoint: No free InfiniPath 
> contexts available on /dev/ipath
> [node017:13314] Error in psm_ep_open (error No free ports could be 
> obtained)
> 
> The Infinipath User Guide writes this:
> 
> "Context Sharing Enabled: The MPI library provides PSM the local process 
> layout
> so that InfiniPath contexts available on each node can be shared if 
> necessary; for
> example, when running more node programs than contexts. By default, the
> QLE7140 and QHT7140 have a maximum of four and eight sharable InfiniPath
> contexts, respectively. Up to 4 node programs (from the same MPI job) can 
> share
> an InfiniPath context, for a total of 16 node programs per node for each 
> QLE7140
> and 32 node programs per node for each QHT7140.
> The error message when this limit is exceeded is:
> 
> No free InfiniPath contexts available on /dev/ipath
> "
> 
> It looks like OpenMPI is running into the context limit, apparently 4 
> inthis case. Can I do the context sharing mentioned with OpenMPI?
> 
> Best regards,
> 
> Daniël Mantione
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
christian.b...@qlogic.com
(QLogic Host Solutions Group, formerly Pathscale)
Index: ompi/mca/mtl/psm/mtl_psm.c
===================================================================
--- ompi/mca/mtl/psm/mtl_psm.c  (revision 17385)
+++ ompi/mca/mtl/psm/mtl_psm.c  (working copy)
@@ -102,6 +102,18 @@

     }

+    /* 
+     * Figure out how many procs are running on this host to handle context
+     * sharing corner cases.
+     */
+    if (orte_process_info.num_local_procs > 0) {
+       char buf[16];
+       snprintf(buf, sizeof buf - 1, "%d", orte_process_info.num_local_procs);
+       setenv("MPI_LOCALNRANKS", buf, 0);
+       snprintf(buf, sizeof buf - 1, "%d", orte_process_info.local_rank);
+       setenv("MPI_LOCALRANKID", buf, 0);
+    }
+
     /* Handle our own errors for opening endpoints */
     psm_error_register_handler(ompi_mtl_psm.ep, ompi_mtl_psm_errhandler);

Reply via email to