Try configuring --without-psm

That  should solve the problem. We are probably picking up that you have PSM 
libraries on the machine, but it looks like you aren't actually running it.

And yes - it should gracefully disable itself. You might check the 1.6 series 
to see if it behaves better - if not, we should fix it.


On Nov 2, 2012, at 8:49 AM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> wrote:

> I am getting a problem where something called "PSM" is failing to start and 
> that in turn is preventing my job from running.  Command and output are 
> below.  I would like to understand what's going on.  Apparently this version 
> of OpenMPI decided to build itself with support for PSM, but if it's not 
> available, why fail if other transports are available?  Also, in my command I 
> think I've told OpenMPI not to use anything but self and sm, so why would it 
> try to use PSM? 
> 
> Thanks in advance for any help...
> 
> user@machinename:~> /usr/mpi/intel/openmpi-1.4.3/bin/ompi_info -all | grep psm
>                 MCA mtl: psm (MCA v2.0, API v2.0, Component v1.4.3)
>                 MCA mtl: parameter "mtl_psm_connect_timeout" (current value: 
> "180", data source: default value)
>                 MCA mtl: parameter "mtl_psm_debug" (current value: "1", data 
> source: default value)
>                 MCA mtl: parameter "mtl_psm_ib_unit" (current value: "-1", 
> data source: default value)
>                 MCA mtl: parameter "mtl_psm_ib_port" (current value: "0", 
> data source: default value)
>                 MCA mtl: parameter "mtl_psm_ib_service_level" (current value: 
> "0", data source: default value)
>                 MCA mtl: parameter "mtl_psm_ib_pkey" (current value: "32767", 
> data source: default value)
>                 MCA mtl: parameter "mtl_psm_priority" (current value: "0", 
> data source: default value)
> 
> Here is my command:
> 
> /usr/mpi/intel/openmpi-1.4.3/bin/mpirun -n 1 --mca btl_base_verbose 30 --mca 
> btl self,sm /release/cfd/simgrid/P_OPT.LINUX64
> 
> and here is the output:
> 
> [machinename:01124] mca: base: components_open: Looking for btl components
> [machinename:01124] mca: base: components_open: opening btl components
> [machinename:01124] mca: base: components_open: found loaded component self
> [machinename:01124] mca: base: components_open: component self has no 
> register function
> [machinename:01124] mca: base: components_open: component self open function 
> successful
> [machinename:01124] mca: base: components_open: found loaded component sm
> [machinename:01124] mca: base: components_open: component sm has no register 
> function
> [machinename:01124] mca: base: components_open: component sm open function 
> successful
> machinename.1124ipath_userinit: assign_context command failed: Network is down
> machinename.1124can't open /dev/ipath, network down (err=26)
> --------------------------------------------------------------------------
> PSM was unable to open an endpoint. Please make sure that the network link is
> active on the node and the hardware is functioning.
> 
>  Error: Could not detect network connectivity
> --------------------------------------------------------------------------
> [machinename:01124] mca: base: close: component self closed
> [machinename:01124] mca: base: close: unloading component self
> [machinename:01124] mca: base: close: component sm closed
> [machinename:01124] mca: base: close: unloading component sm
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>  PML add procs failed
>  --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to