Thanks a lot! now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our nodes have a short/long name (it's rhel 5.x, so the command hostname returns the long name) and at least 2 IP addresses.
p. On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain <r...@open-mpi.org> wrote: > Okay, fixed in r23499. Thanks again... > > > On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote: > >> Doh - yes it should! I'll fix it right now. >> >> Thanks! >> >> On Jul 26, 2010, at 9:28 PM, Philippe wrote: >> >>> Ralph, >>> >>> i was able to test the generic module and it seems to be working. >>> >>> one question tho, the function orte_ess_generic_component_query in >>> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the >>> argument "OMPI_MCA_enc", which seems to cause the module to fail to >>> load. shouldnt it be "OMPI_MCA_ess" ? >>> >>> ..... >>> >>> /* only pick us if directed to do so */ >>> if (NULL != (pick = getenv("OMPI_MCA_env")) && >>> 0 == strcmp(pick, "generic")) { >>> *priority = 1000; >>> *module = (mca_base_module_t *)&orte_ess_generic_module; >>> >>> ... >>> >>> p. >>> >>> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> Dev trunk looks okay right now - I think you'll be fine using it. My new >>>> component -might- work with 1.5, but probably not with 1.4. I haven't >>>> checked either of them. >>>> >>>> Anything at r23478 or above will have the new module. Let me know how it >>>> works for you. I haven't tested it myself, but am pretty sure it should >>>> work. >>>> >>>> >>>> On Jul 22, 2010, at 3:22 PM, Philippe wrote: >>>> >>>>> Ralph, >>>>> >>>>> Thank you so much!! >>>>> >>>>> I'll give it a try and let you know. >>>>> >>>>> I know it's a tough question, but how stable is the dev trunk? Can I >>>>> just grab the latest and run, or am I better off taking your changes >>>>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?) >>>>> >>>>> p. >>>>> >>>>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>> It was easier for me to just construct this module than to explain how >>>>>> to do so :-) >>>>>> >>>>>> I will commit it this evening (couple of hours from now) as that is our >>>>>> standard practice. You'll need to use the developer's trunk, though, to >>>>>> use it. >>>>>> >>>>>> Here are the envars you'll need to provide: >>>>>> >>>>>> Each process needs to get the same following values: >>>>>> >>>>>> * OMPI_MCA_ess=generic >>>>>> * OMPI_MCA_orte_num_procs=<number of MPI procs> >>>>>> * OMPI_MCA_orte_nodes=<a comma-separated list of nodenames where MPI >>>>>> procs reside> >>>>>> * OMPI_MCA_orte_ppn=<number of procs/node> >>>>>> >>>>>> Note that I have assumed this last value is a constant for simplicity. >>>>>> If that isn't the case, let me know - you could instead provide it as a >>>>>> comma-separated list of values with an entry for each node. >>>>>> >>>>>> In addition, you need to provide the following value that will be unique >>>>>> to each process: >>>>>> >>>>>> * OMPI_MCA_orte_rank=<MPI rank> >>>>>> >>>>>> Finally, you have to provide a range of static TCP ports for use by the >>>>>> processes. Pick any range that you know will be available across all the >>>>>> nodes. You then need to ensure that each process sees the following >>>>>> envar: >>>>>> >>>>>> * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this >>>>>> with your range >>>>>> >>>>>> You will need a port range that is at least equal to the ppn for the job >>>>>> (each proc on a node will take one of the provided ports). >>>>>> >>>>>> That should do it. I compute everything else I need from those values. >>>>>> >>>>>> Does that work for you? >>>>>> Ralph >>>>>> >>>>>>