On Wed, 2021-06-09 at 14:40 +1000, David Gibson wrote: > On Tue, Jun 08, 2021 at 09:52:10PM -0300, Leonardo Brás wrote: > > On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote: > > > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote: > > > > Because hypervisors may need to create HPTs without knowing the > > > > guest > > > > page size, the smallest used page-size (4k) may be chosen, > > > > resulting in > > > > a HPT that is possibly bigger than needed. > > > > > > > > On a guest with bigger page-sizes, the amount of entries for > > > > HTP > > > > may be > > > > too high, causing the guest to ask for a HPT resize-down on the > > > > first > > > > hotplug. > > > > > > > > This becomes a problem when HPT resize-down fails, and causes > > > > the > > > > HPT resize to be performed on every LMB added, until HPT size > > > > is > > > > compatible to guest memory size, causing a major slowdown. > > > > > > > > So, avoiding HPT resizing-down on hot-add significantly > > > > improves > > > > memory > > > > hotplug times. > > > > > > > > As an example, hotplugging 256GB on a 129GB guest took 710s > > > > without > > > > this > > > > patch, and 21s after applied. > > > > > > > > Signed-off-by: Leonardo Bras <leobra...@gmail.com> > > > > > > Sorry it's taken me so long to look at these > > > > > > I don't love the extra statefulness that the 'shrinking' > > > parameter > > > adds, but I can't see an elegant way to avoid it, so: > > > > > > Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> > > > > np, thanks for reviewing! > > Actually... I take that back. With the subsequent patches my > discomfort with the complexity of implementing the batching grew. > > I think I can see a simpler way - although it wasn't as clear as I > thought it might be, without some deep history on this feature. > > What's going on here is pretty hard to follow, because it starts in > arch-specific code (arch/powerpc/platforms/pseries/hotplug-memory.c) > where it processes the add/remove requests, then goes into generic > code __add_memory() which eventually emerges back in arch specific > code (hash__create_section_mapping()). > > The HPT resizing calls are in the "inner" arch specific section, > whereas it's only the outer arch section that has the information to > batch properly. The mutex and 'shrinking' parameter in Leonardo's > code are all about conveying information from the outer to inner > section. > > Now, I think the reason I had the resize calls in the inner section > was to accomodate the notion that a) pHyp might support resizing in > future, and it could come in through a different path with its drmgr > thingy and/or b) bare metal hash architectures might want to > implement > hash resizing, and this would make at least part of the path common. > > Given the decreasing relevance of hash MMUs, I think we can now > safely > say neither of these is ever going to happen. > > Therefore, we can simplify things by moving the HPT resize calls into > the pseries LMB code, instead of create/remove_section_mapping. Then > to do batching without extra complications we just need this logic > for > all resizes (both add and remove): > > let new_hpt_order = expected HPT size for new mem size; > > if (new_hpt_order > current_hpt_order) > resize to new_hpt_order > > add/remove memory > > if (new_hpt_order < current_hpt_order - 1) > resize to new_hpt_order > >
Ok, that really does seem to simplify a lot the batching. Question: by LMB code, you mean dlpar_memory_{add,remove}_by_* ? (dealing only with dlpar_{add,remove}_lmb() would not be enough to deal with batching) In my 3/3 repĺy I sent you some other examples of functions that currently end up calling resize_hpt_for_hotplug() without comming from hotplug-memory.c. Is that ok that they do not call it anymore? Best regards, Leonardo Bras