On Oct 16, 2014, at 9:43 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> Hi Ralph > > Yes, I know the process placement features are powerful. > They were already very good in 1.6, even in 1.4, > and I just tried the new 1.8 > "-map-by l2cache" (works nicely on Opteron 6300). > > Unfortunately I couldn't keep track, test, and use the 1.7 series. > I did that in the previous "odd/new feature" series (1.3, 1.5). > However, my normal workload require that > I focus my attention on the "even/stable" series > (less fun, more production). > Hence I hopped directly from 1.6 to 1.8, > although I read a number of mailing list postings about the new > style of process placement. > > Pestering you again about documentation (last time for now): Only for now?? :-) > The mpiexec man page also seems to need an update. > That is probably the first place people look for information > about runtime features. > For instance, the process placement examples still > use deprecated parameters and mpiexec options: > -bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc. On my to-do list > > Thank you, > Gus Correa > > On 10/15/2014 11:10 PM, Ralph Castain wrote: >> >> On Oct 15, 2014, at 11:46 AM, Gus Correa <g...@ldeo.columbia.edu >> <mailto:g...@ldeo.columbia.edu>> wrote: >> >>> Thank you Ralph and Jeff for the help! >>> >>> Glad to hear the segmentation fault is reproducible and will be fixed. >>> >>> In any case, one can just avoid the old parameter name >>> (rmaps_base_schedule_policy), >>> and use instead the new parameter name >>> (rmaps_base_mapping_policy) >>> without any problem in OMPI 1.8.3. >>> >> >> Fix is in the nightly 1.8 tarball - I'll release a 1.8.4 soon to cover >> the problem. >> >>> ** >>> >>> Thanks Ralph for sending the new (OMPI 1.8) >>> parameter names for process binding. >>> >>> My recollection is that sometime ago somebody (Jeff perhaps?) >>> posted here a link to a presentation (PDF or PPT) explaining the >>> new style of process binding, but I couldn't find it in the >>> list archives. >>> Maybe the link could be part of the FAQ (if not already there)? >> >> I don't think it is, but I'll try to add it over the next day or so. >> >>> >>> ** >>> >>> The Open MPI runtime environment is really great. >>> However, to take advantage of it one often has to do parameter guessing, >>> and to do time consuming tests by trial and error, >>> because the main sources of documentation are >>> the terse output of ompi_info, and several sparse >>> references in the FAQ. >>> (Some of them outdated?) >>> >>> In addition, the runtime environment has evolved over time, >>> which is certainly a good thing. >>> However, along with this evolution, several runtime parameters >>> changed both name and functionality, new ones were introduced, >>> old ones were deprecated, which can be somewhat confusing, >>> and can lead to an ineffective use of the runtime environment. >>> (In 1.8.3 I was using several deprecated parameters from 1.6.5 >>> that seem to be silently ignored at runtime. >>> I only noticed the problem because that segmentation fault happened.) >>> >>> I know asking for thorough documentation is foolish, >> >> Not really - it is something we need to get better about :-( >> >>> but I guess a simple table of runtime parameter names and valid values >>> in the FAQ, maybe sorted by their purpose/function, along with a few >>> examples of use, could help a lot. >>> Some of this material is now spread across several FAQ, but not so >>> easy to find/compare. >>> That doesn't need to be a comprehensive table, but commonly used >>> items like selecting the btl, selecting interfaces, >>> dealing with process binding, >>> modifying/enriching the stdout/sterr output >>> (tagging output, increasing verbosity, etc), >>> probably have their place there. >> >> Yeah, we fell down on this one. The changes were announced with each >> step in the 1.7 series, but if you step from 1.6 directly to 1.8, you'll >> get caught flat-footed. We honestly didn't think of that case, and so we >> mentally assumed that "of course people have been following the series - >> they know what happened". >> >> You know what they say about those who "assume" :-/ >> >> I'll try to get something into the FAQ about the entire new mapping, >> ranking, and binding system. It is actually VERY powerful, allowing you >> to specify pretty much any placement pattern you can imagine and bind it >> to whatever level you desire. It was developed in response to requests >> from researchers who wanted to explore application performance versus >> placement strategies - but we provided some simplified options to >> support more common usage patterns. >> >> >>> >>> >>> Many thanks, >>> Gus Correa >>> >>> >>> On 10/15/2014 11:12 AM, Jeff Squyres (jsquyres) wrote: >>>> We talked off-list -- fixed this on master and just filed >>>> https://github.com/open-mpi/ompi-release/pull/33 to get this into the >>>> v1.8 branch. >>>> >>>> >>>> On Oct 14, 2014, at 7:39 PM, Ralph Castain <r...@open-mpi.org >>>> <mailto:r...@open-mpi.org>> wrote: >>>> >>>>> >>>>> On Oct 14, 2014, at 5:32 PM, Gus Correa <g...@ldeo.columbia.edu >>>>> <mailto:g...@ldeo.columbia.edu>> wrote: >>>>> >>>>>> Dear Open MPI fans and experts >>>>>> >>>>>> This is just a note in case other people run into the same problem. >>>>>> >>>>>> I just built Open MPI 1.8.3. >>>>>> As usual I put my old settings on openmpi-mca-params.conf, >>>>>> with no further thinking. >>>>>> Then I compiled the connectivity_c.c program and tried >>>>>> to run it with mpiexec. >>>>>> That is a routine that never failed before. >>>>>> >>>>>> Bummer! >>>>>> I've got a segmentation fault right away. >>>>> >>>>> Strange - it works fine from the cmd line: >>>>> >>>>> 07:27:04 (v1.8) /home/common/openmpi/ompi-release$ mpirun -n 1 -mca >>>>> rmaps_base_schedule_policy core hostname >>>>> -------------------------------------------------------------------------- >>>>> A deprecated MCA variable value was specified in the environment or >>>>> on the command line. Deprecated MCA variables should be avoided; >>>>> they may disappear in future releases. >>>>> >>>>> Deprecated variable: rmaps_base_schedule_policy >>>>> New variable: rmaps_base_mapping_policy >>>>> -------------------------------------------------------------------------- >>>>> bend001 >>>>> >>>>> HOWEVER, I can replicate that behavior when it is in the default >>>>> params file! I don't see the immediate cause of the difference, but >>>>> will investigate. >>>>> >>>>>> >>>>>> After some head scratching, checking my environment, etc, >>>>>> I thought I might have configured OMPI incorrectly. >>>>>> Hence, I tried to get information from ompi_info. >>>>>> Oh well, ompi_info also segfaulted! >>>>>> >>>>>> It took me a while to realize that the runtime parameter >>>>>> configuration file was the culprit. >>>>>> >>>>>> When I inserted the runtime parameter settings one by one, >>>>>> the segfault came with this one: >>>>>> >>>>>> rmaps_base_schedule_policy = core >>>>>> >>>>>> Ompi_info (when I got it to work) told me that the parameter above >>>>>> is now a deprecated synonym of: >>>>>> >>>>>> rmaps_base_mapping_policy = core >>>>>> >>>>>> In any case, the old synonym doesn't work and makes ompi_info and >>>>>> mpiexec segfault (and I'd guess anything else that requires the >>>>>> OMPI runtime components). >>>>>> Only the new parameter name works. >>>>> >>>>> That's because the segfault is happening in the printing of the >>>>> deprecation warning. >>>>> >>>>>> >>>>>> *** >>>>>> >>>>>> I am also missing in the ompi_info output the following >>>>>> (OMPI 1.6.5) parameters (not reported by ompi_info --all --all): >>>>>> >>>>> >>>>> 1) orte_process_binding ===> hwloc_base_binding_policy >>>>> >>>>> 2) orte_report_bindings ===> hwloc_base_report_bindings >>>>> >>>>> 3) opal_paffinity_alone ===> gone, use >>>>> hwloc_base_binding_policy=none if you don't want any binding >>>>> >>>>>> >>>>>> Are they gone forever? >>>>>> >>>>>> Are there replacements for them, with approximately the same >>>>>> functionality? >>>>>> >>>>>> Is there a list comparing the new (1.8) vs. old (1.6) >>>>>> OMPI runtime parameters, and/or any additional documentation >>>>>> about the new style of OMPI 1.8 runtime parameters? >>>>> >>>>> Will try to add this to the web site >>>>> >>>>>> >>>>>> Since there seems to have been a major revamping of the OMPI >>>>>> runtime parameters, that would be a great help. >>>>>> >>>>>> Thank you, >>>>>> Gus Correa >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25497.php >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/10/25498.php >>>> >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this >>> post:http://www.open-mpi.org/community/lists/users/2014/10/25501.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25503.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25508.php