On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: > Was this ever committed to the OMPI src as something not having to be > run outside of OpenMPI, but as part of the PSM setup that OpenMPI > does?
Not that I know of - I don't think the PSM developers ever looked at it. > > I'm having some trouble getting Slurm/OpenMPI to play nice with the > setup of this key. Namely, with slurm you cannot export variables > from the --prolog of an srun, only from an --task-prolog, > unfortunately, if you use a task-prolog each rank gets a different > key, which doesn't work. > > I'm also guessing that each unique mpirun needs it's own psm key, not > one for the whole system, so i can't just make it a permanent > parameter somewhere else. > > Also, i recall reading somewhere that the --resv-ports parameter that > OMPI uses from slurm to choose a list of ports to use for TCP comm's, > tries to lock a port from the pool three times before giving up. Had to look back at the code - I think you misread this. I can find no evidence in the code that we try to bind that port more than once. > > Can someone tell me where that parameter is set, i'd like to set it to > a higher value. We're seeing issues where running a large number of > short srun's sequentially is causing some of the mpirun's in the > stream to be killed because they could not lock the ports. > > I suspect because of the lag between when the port is actually closed > in linux and when ompi re-opens a new port is very quick, we're trying > three times and giving up. I have more then enough ports in the > resv-ports list, 30k. but i suspect there is some random re-use being > done and it's failing > > thanks > > > On Mon, Jan 3, 2011 at 10:00 AM, Jeff Squyres <jsquy...@cisco.com> wrote: >> Yo Ralph -- >> >> I see this was committed https://svn.open-mpi.org/trac/ompi/changeset/24197. >> Do you want to add a blurb in README about it, and/or have this executable >> compiled as part of the PSM MTL and then installed into $bindir (maybe named >> ompi-psm-keygen)? >> >> Right now, it's only compiled as part of "make check" and not installed, >> right? >> >> On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote: >> >>> Run the program only once - it can be in the prolog of the job if you like. >>> The output value needs to be in the env of every rank. >>> >>> You can reuse the value as many times as you like - it doesn't have to be >>> unique for each job. There is nothing magic about the value itself. >>> >>> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote: >>> >>>> How early does this need to run? Can I run it as part of a task >>>> prolog, or does it need to be the shell env for each rank? And does >>>> it need to run on one node or all the nodes in the job? >>>> >>>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Well, I couldn't do it as a patch - proved too complicated as the psm >>>>> system looks for the value early in the boot procedure. >>>>> >>>>> What I can do is give you the attached key generator program. It outputs >>>>> the envar required to run your program. So if you run the attached >>>>> program and then export the output into your environment, you should be >>>>> okay. Looks like this: >>>>> >>>>> $ ./psm_keygen >>>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 >>>>> $ >>>>> >>>>> You compile the program with the usual mpicc. >>>>> >>>>> Let me know if this solves the problem (or not). > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users