On Fri, Jan 6, 2017 at 4:35 PM, Thomas Herve <the...@redhat.com> wrote: > On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter <zbit...@redhat.com> wrote: >> tl;dr everything looks great, and memory usage has dropped by about 64% >> since the initial Newton release of Heat. >> >> I re-ran my analysis of Heat memory usage in the tripleo-heat-templates >> gate. (This is based on the gate-tripleo-ci-centos-7-ovb-nonha job.) Here's >> a pretty picture: >> >> https://fedorapeople.org/~zaneb/tripleo-memory/20170105/heat_memused.png >> >> There is one major caveat here: for the period marked in grey where it says >> "Only 2 engine workers", the job was configured to use only 2 heat-enginer >> worker processes instead of 4, so this is not an apples-to-apples >> comparison. The inital drop at the beginning and the subsequent bounce at >> the end are artifacts of this change. Note that the stable/newton branch is >> _still_ using only 2 engine workers. >> >> The rapidly increasing usage on the left is due to increases in the >> complexity of the templates during the Newton cycle. It's clear that if >> there has been any similar complexity growth during Ocata, it has had a tiny >> effect on memory consumption in comparison. > > Thanks a lot for the analysis. It's great that things haven't gotten off > track. > >> I tracked down most of the step changes to identifiable patches: >> >> 2016-10-07: 2.44GiB -> 1.64GiB >> - https://review.openstack.org/382068/ merged, making ResourceInfo classes >> more memory-efficient. Judging by the stable branch (where this and the >> following patch were merged at different times), this was responsible for >> dropping the memory usage from 2.44GiB -> 1.83GiB. (Which seems like a >> disproportionately large change?) > > Without wanting to get the credit, I believe > https://review.openstack.org/377061/ is more likely the reason here. > >> - https://review.openstack.org/#/c/382377/ merged, so we no longer create >> multiple yaql contexts. (This was responsible for the drop from 1.83GiB -> >> 1.64GiB.) >> >> 2016-10-17: 1.62GiB -> 0.93GiB >> - https://review.openstack.org/#/c/386696/ merged, reducing the number of >> engine workers on the undercloud to 2. >> >> 2016-10-19: 0.93GiB -> 0.73GiB (variance also seemed to drop after this) >> - https://review.openstack.org/#/c/386247/ merged (on 2016-10-16), avoiding >> loading all nested stacks in a single process simultaneously much of the >> time. >> - https://review.openstack.org/#/c/383839/ merged (on 2016-10-16), >> switching output calculations to RPC to avoid almost all simultaneous >> loading of all nested stacks. >> >> 2016-11-08: 0.76GiB -> 0.70GiB >> - This one is a bit of a mystery??? > > Possibly https://review.openstack.org/390064/ ? Reducing the > environment size could have an effect. > >> 2016-11-22: 0.69GiB -> 0.50GiB >> - https://review.openstack.org/#/c/398476/ merged, improving the efficiency >> of resource listing? >> >> 2016-12-01: 0.49GiB -> 0.88GiB >> - https://review.openstack.org/#/c/399619/ merged, returning the number of >> engine workers on the undercloud to 4. >> >> It's not an exact science because IIUC there's a delay between a patch >> merging in Heat and it being used in subsequent t-h-t gate jobs. e.g. the >> change to getting outputs over RPC landed the day before the >> instack-undercloud patch that cut the number of engine workers, but the >> effects don't show up until 2 days after. I'd love to figure out what >> happened on the 8th of November, but I can't correlate it to anything >> obvious. The attribution of the change on the 22nd also seems dubious, but >> the timing adds up (including on stable/newton). >> >> It's fair to say that none of the other patches we merged in an attempt to >> reduce memory usage had any discernible effect :D >> >> It's worth reiterating that TripleO still disables convergence in the >> undercloud, so these are all tests of the legacy code path. It would be >> great if we could set up a non-voting job on t-h-t with convergence enabled >> and start tracking memory use over time there too. As a first step, maybe we >> could at least add an experimental job on Heat to give us a baseline? > > +1. We haven't made any huge changes into that direction, but having > some info would be great.
+1 too. I volunteer to do it. Quick question: to enable it, is it just a matter of setting convergence_engine to true in heat.conf (on the undercloud)? If not, what else if needed? > -- > Thomas > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Emilien Macchi __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev