Hi, On 8/3/22 15:02, Gavin Shan wrote: > Hi Marc, > > On 8/3/22 5:01 PM, Marc Zyngier wrote: >> On Wed, 03 Aug 2022 04:01:04 +0100, >> Gavin Shan <gs...@redhat.com> wrote: >>> On 8/2/22 7:41 PM, Eric Auger wrote: >>>> On 8/2/22 08:45, Gavin Shan wrote: >>>>> There are 3 highmem IO regions as below. They can be disabled in >>>>> two situations: (a) The specific region is disabled by user. (b) >>>>> The specific region doesn't fit in the PA space. However, the base >>>>> address and highest_gpa are still updated no matter if the region >>>>> is enabled or disabled. It's incorrectly incurring waste in the PA >>>>> space. >>>> If I am not wrong highmem_redists and highmem_mmio are not user >>>> selectable >>>> >>>> Only highmem ecam depends on machine type & ACPI setup. But I would >>>> say >>>> that in server use case it is always set. So is that optimization >>>> really >>>> needed? >>> >>> There are two other cases you missed. >>> >>> - highmem_ecam is enabled after virt-2.12, meaning it stays disabled >>> before that. >> >> I don't get this. The current behaviour is to disable highmem_ecam if >> it doesn't fit in the PA space. I can't see anything that enables it >> if it was disabled the first place. >> > > There are several places or conditions where vms->highmem_ecam can be > disabled: > > - virt_instance_init() where vms->highmem_ecam is inherited from > !vmc->no_highmem_ecam. The option is set to true after virt-2.12 > in virt_machine_2_12_options(). > > - machvirt_init() where vms->highmem_ecam can be disable if we have > 32-bits vCPUs and failure on loading firmware. > > - Another place is where we're talking about. It's address assignment > to fit the PA space. > >>> >>> - The high memory region can be disabled if user is asking large >>> (normal) memory space through 'maxmem=' option. When the requested >>> memory by 'maxmem=' is large enough, the high memory regions are >>> disabled. It means the normal memory has higher priority than those >>> high memory regions. This is the case I provided in (b) of the >>> commit log. >> >> Why is that a problem? It matches the expected behaviour, as the >> highmem IO region is floating and is pushed up by the memory region. >> > > Eric thought that VIRT_HIGH_GIC_REDIST2 and VIRT_HIGH_PCIE_MMIO regions > aren't user selectable. I tended to explain why it's not true. 'maxmem=' > can affect the outcome. When 'maxmem=' value is big enough, there will be > no free area in the PA space to hold those two regions. > >>> >>> In the commit log, I was supposed to say something like below for >>> (a): >>> >>> - The specific high memory region can be disabled through changing >>> the code by user or developer. For example, 'vms->highmem_mmio' >>> is changed from true to false in virt_instance_init(). >> >> Huh. By this principle, the user can change anything. Why is it >> important? >> > > Still like above. I was explaining the possible cases where those > 3 switches can be turned on/off by users or developers. Our code > needs to be consistent and comprehensive. > > vms->highmem_redists > vms->highmem_ecam > vms->mmio > >>> >>>>> >>>>> Improve address assignment for highmem IO regions to avoid the waste >>>>> in the PA space by putting the logic into virt_memmap_fits(). >> >> I guess that this is what I understand the least. What do you mean by >> "wasted PA space"? Either the regions fit in the PA space, and >> computing their addresses in relevant, or they fall outside of it and >> what we stick in memap[index].base is completely irrelevant. >> > > It's possible that we run into the following combination. we should > have enough PA space to enable VIRT_HIGH_PCIE_MMIO region. However, > the region is disabled in the original implementation because > VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} regions consumed 1GB, which is > unecessary and waste in the PA space. each region's base is aligned on its size. > > static MemMapEntry extended_memmap[] = { > [VIRT_HIGH_GIC_REDIST2] = { 0x0, 64 * MiB }, > [VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB }, > [VIRT_HIGH_PCIE_MMIO] = { 0x0, 512 * GiB }, so anyway MMIO is at least at 512GB. Having a 1TB IPA space does not imply any amount of RAM. This depends on the address space. I }; > > IPA_LIMIT = (1UL << 40) > '-maxmem' = 511GB /* Memory starts from 1GB */ > '-slots' = 0 > vms->highmem_rdist2 = false How can this happen? the only reason for highmem_redists to be reset is if it does not fit into map_ipa. So if mmio fits, highmem_redists does too. What do I miss? > vms->highmem_ecam = false vms->highmem_mmio = true I am still skeptic about the relevance of such optim. Isn't it sensible to expect the host to support large IPA if you want to use such amount of memory. I think we should rather better document the memory map from a user point of view. Besides if you change the memory map, you need to introduce yet another option to avoid breaking migration, no?
Eric > >>>>> >>>>> Signed-off-by: Gavin Shan <gs...@redhat.com> >>>>> --- >>>>> hw/arm/virt.c | 54 >>>>> +++++++++++++++++++++++++++++---------------------- >>>>> 1 file changed, 31 insertions(+), 23 deletions(-) >>>>> >>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>>>> index 9633f822f3..bc0cd218f9 100644 >>>>> --- a/hw/arm/virt.c >>>>> +++ b/hw/arm/virt.c >>>>> @@ -1688,6 +1688,34 @@ static uint64_t >>>>> virt_cpu_mp_affinity(VirtMachineState *vms, int idx) >>>>> return arm_cpu_mp_affinity(idx, clustersz); >>>>> } >>>>> +static void virt_memmap_fits(VirtMachineState *vms, int index, >>>>> + bool *enabled, hwaddr *base, int >>>>> pa_bits) >>>>> +{ >>>>> + hwaddr size = extended_memmap[index].size; >>>>> + >>>>> + /* The region will be disabled if its size isn't given */ >>>>> + if (!*enabled || !size) { >>>> In which case do you have !size? >>> >>> Yeah, we don't have !size and the condition should be removed. >>> >>>>> + *enabled = false; >>>>> + vms->memmap[index].base = 0; >>>>> + vms->memmap[index].size = 0; >>>> It looks dangerous to me to reset the region's base and size like >>>> that. >>>> for instance fdt_add_gic_node() will add dummy data in the dt. >>> >>> I would guess you missed that the high memory regions won't be exported >>> through device-tree if they have been disabled. We have a check there, >>> which is "if (nb_redist_regions == 1)" >>> >>>>> + return; >>>>> + } >>>>> + >>>>> + /* >>>>> + * Check if the memory region fits in the PA space. The >>>>> memory map >>>>> + * and highest_gpa are updated if it fits. Otherwise, it's >>>>> disabled. >>>>> + */ >>>>> + *enabled = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)); >>>> using a 'fits' local variable would make the code more obvious I think >>> >>> Lets confirm if you're suggesting something like below? >>> >>> bool fits; >>> >>> fits = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)); >>> >>> if (fits) { >>> /* update *base, memory mapping, highest_gpa */ >>> } else { >>> *enabled = false; >>> } >>> >>> I guess we can simply do >>> >>> if (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)) { >>> /* update *base, memory mapping, highest_gpa */ >>> } else { >>> *enabled = false; >>> } >>> >>> Please let me know which one looks best to you. >> >> Why should the 'enabled' flag be updated by this function, instead of >> returning the value and keeping it as an assignment in the caller >> function? It is purely stylistic though. >> > > The idea to put the logic, address assignment for those 3 high memory > regions or updating the flags (vms->high_redist2/ecam/mmio), to the > newly introduced function, to make virt_set_memmap() a bit simplified. > Eric tends to agree the changes with minor adjustments. So I'm going > to keep it as of being, which doesn't mean the stylistic thought is > a bad one :) > > Lastly, I need to rewrite the comit log completely because it seems > causing confusions to Eric and you. > > Thanks, > Gavin >