On Wed, 11 Dec 2019 09:44:11 +0530 Shivaprasad G Bhat <sb...@linux.ibm.com> wrote:
> On 12/06/2019 07:22 AM, David Gibson wrote: > > On Wed, Nov 27, 2019 at 09:50:54AM +0530, Bharata B Rao wrote: > >> On Fri, Nov 22, 2019 at 10:42 AM David Gibson > >> <da...@gibson.dropbear.id.au> wrote: > >>> Ok. A number of queries about this. > >>> > >>> 1) The PAPR spec for ibm,dynamic-memory-v2 says that the first word in > >>> each entry is the number of LMBs, but for NVDIMMs you use the > >>> not-necessarily-equal scm_block_size instead. Does the NVDIMM > >>> amendment for PAPR really specify to use different block sizes for > >>> these cases? (In which case that's a really stupid spec decision, but > >>> that wouldn't surprise me at this point). > >> SCM block sizes can be different from LMB sizes, but here we enforce > >> that SCM device size (excluding metadata) to multiple of LMB size so > >> that we don't end up memory range that is not aligned to LMB size. > > Right, but it still doesn't make sense to use scm_block_size when you > > create the dynamic-memory-v2 property. > > Right, I should use LMB size here as I will be creating holes here to > disallow DIMMs > to claim those LMBs marking them INVALID as Bharata Suggested before. > > > As far as the thing > > interpreting that goes, it *must* be LMB size, not SCM block size. If > > those are required to be the same at this point, you should use an > > assert(). > > SCM block size should be a multiple for LMB size, need not be equal. > I'll add an assert > for that, checking if equal. There is no benefit I see as of now having > higher > SCM block size as the bind/unbind are already done before the bind hcall. > > >>> 2) Similarly, the ibm,dynamic-memory-v2 description says that the > >>> memory block described by the entry has a whole batch of contiguous > >>> DRCs starting at the DRC index given and continuing for #LMBs DRCs. > >>> For NVDIMMs it appears that you just have one DRC for the whole > >>> NVDIMM. Is that right? > >> One NVDIMM has one DRC, In our case, we need to mark the LMBs > >> corresponding to that address range in ibm,dynamic-memory-v2 as > >> reserved and invalid. > > Ok, that fits very weirdly with the DRC allocation for the rest of > > pluggable memory, but I suppose that's PAPR for you. > > > > Having these in together is very inscrutable though, and relies on a > > heap of non-obvious constraints about placement of DIMMs and NVDIMMs > > relative to each other. I really wonder if it would be better to have > > a completely different address range for the NVDIMMs. > > The backend object for both DIMM and NVDIMM are memory-backend-* > and they use the address from the same space. Separating it would mean > using/introducing different backend object. I dont think we have a > choice here. What address-space(s) are are talking about here exactly? >From my point of view memory-backend-* provides RAM block at some HVA, which shouldn't not have anything to do with how NVDIMM partitions and maps it to GPA. > >>> 3) You're not setting *any* extra flags on the entry. How is the > >>> guest supposed to know which are NVDIMM entries and which are regular > >>> DIMM entries? AFAICT in this version the NVDIMM slots are > >>> indistinguishable from the unassigned hotplug memory (which makes the > >>> difference in LMB and DRC numbering even more troubling). > >> For NVDIMM case, this patch should populate the LMB set in > >> ibm,dynamic-memory-v2 something like below: > >> elem = spapr_get_drconf_cell(size /lmb_size, addr, > >> 0, -1, > >> SPAPR_LMB_FLAGS_RESERVED | SPAPR_LMB_FLAGS_DRC_INVALID); > >> > >> This will ensure that the NVDIMM range will never be considered as > >> valid memory range for memory hotplug. > > Hrm. Ok so we already have code that does that for any gaps between > > DIMMs. I don't think there's actually anything that that code will do > > differently than the code you have for NVDIMMs, so you could just skip > > over the NVDIMMs here and it should do the right thing. > > > > The *interpretation* of those entries will become different: for space > > into which a regular DIMM is later inserted, we'll assume the DRC > > index given is a base and there are more DRCs following it, but for > > NVDIMMs we'll assume the same DRC throughout. This is nuts, but IIUC > > that's what PAPR says and we can't do much about it. > > My current patch is buggy as Bharata pointed out. The NVDIMM DRCs > are not to be populated here, but mark the LMB DRCs as RESERVED and INVALID > so that no malicious attempts to online those LMBs at those NVDIMM address > ranges are attempted. > > > > >>> 4) AFAICT these are _present_ NVDIMMs, so why is > >>> SPAPR_LMB_FLAGS_ASSIGNED not set for them? (and why is the node > >>> forced to -1, regardless of di->node). > >>> > >>>> QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry); > >>>> nr_entries++; > >>>> cur_addr = addr + size; > >>>> @@ -1261,6 +1273,85 @@ static void spapr_dt_hypervisor(SpaprMachineState > >>>> *spapr, void *fdt) > >>>> } > >>>> } > >>>> > >>>> +static void spapr_create_nvdimm_dr_connectors(SpaprMachineState *spapr) > >>>> +{ > >>>> + MachineState *machine = MACHINE(spapr); > >>>> + int i; > >>>> + > >>>> + for (i = 0; i < machine->ram_slots; i++) { > >>>> + spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM, i); > >>> What happens if you try to plug an NVDIMM to one of these slots, but a > >>> regular DIMM has already taken it? > >> NVDIMM hotplug won't get that occupied slot. > > Ok. > > >