On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:

> Was this ever committed to the OMPI src as something not having to be
> run outside of OpenMPI, but as part of the PSM setup that OpenMPI
> does?

Not that I know of - I don't think the PSM developers ever looked at it.

> 
> I'm having some trouble getting Slurm/OpenMPI to play nice with the
> setup of this key.  Namely, with slurm you cannot export variables
> from the --prolog of an srun, only from an --task-prolog,
> unfortunately, if you use a task-prolog each rank gets a different
> key, which doesn't work.
> 
> I'm also guessing that each unique mpirun needs it's own psm key, not
> one for the whole system, so i can't just make it a permanent
> parameter somewhere else.
> 
> Also, i recall reading somewhere that the --resv-ports parameter that
> OMPI uses from slurm to choose a list of ports to use for TCP comm's,
> tries to lock a port from the pool three times before giving up.

Had to look back at the code - I think you misread this. I can find no evidence 
in the code that we try to bind that port more than once.

> 
> Can someone tell me where that parameter is set, i'd like to set it to
> a higher value.  We're seeing issues where running a large number of
> short srun's sequentially is causing some of the mpirun's in the
> stream to be killed because they could not lock the ports.
> 
> I suspect because of the lag between when the port is actually closed
> in linux and when ompi re-opens a new port is very quick, we're trying
> three times and giving up.  I have more then enough ports in the
> resv-ports list, 30k.  but i suspect there is some random re-use being
> done and it's failing
> 
> thanks
> 
> 
> On Mon, Jan 3, 2011 at 10:00 AM, Jeff Squyres <jsquy...@cisco.com> wrote:
>> Yo Ralph --
>> 
>> I see this was committed https://svn.open-mpi.org/trac/ompi/changeset/24197. 
>>  Do you want to add a blurb in README about it, and/or have this executable 
>> compiled as part of the PSM MTL and then installed into $bindir (maybe named 
>> ompi-psm-keygen)?
>> 
>> Right now, it's only compiled as part of "make check" and not installed, 
>> right?
>> 
>> On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote:
>> 
>>> Run the program only once - it can be in the prolog of the job if you like. 
>>> The output value needs to be in the env of every rank.
>>> 
>>> You can reuse the value as many times as you like - it doesn't have to be 
>>> unique for each job. There is nothing magic about the value itself.
>>> 
>>> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote:
>>> 
>>>> How early does this need to run? Can I run it as part of a task
>>>> prolog, or does it need to be the shell env for each rank?  And does
>>>> it need to run on one node or all the nodes in the job?
>>>> 
>>>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> Well, I couldn't do it as a patch - proved too complicated as the psm 
>>>>> system looks for the value early in the boot procedure.
>>>>> 
>>>>> What I can do is give you the attached key generator program. It outputs 
>>>>> the envar required to run your program. So if you run the attached 
>>>>> program and then export the output into your environment, you should be 
>>>>> okay. Looks like this:
>>>>> 
>>>>> $ ./psm_keygen
>>>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954
>>>>> $
>>>>> 
>>>>> You compile the program with the usual mpicc.
>>>>> 
>>>>> Let me know if this solves the problem (or not).
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to