On Fri, 30 Aug 2019 18:19:29 +0200 Christian Borntraeger <borntrae...@de.ibm.com> wrote:
> On 30.08.19 11:41, Igor Mammedov wrote: > > On Thu, 29 Aug 2019 14:41:13 +0200 > > Christian Borntraeger <borntrae...@de.ibm.com> wrote: > > > >> On 29.08.19 14:31, Igor Mammedov wrote: > >>> On Thu, 29 Aug 2019 14:07:44 +0200 > >>> Christian Borntraeger <borntrae...@de.ibm.com> wrote: > >>> > >>>> On 29.08.19 14:04, Igor Mammedov wrote: > >>>>> On Thu, 29 Aug 2019 08:47:49 +0200 > >>>>> Christian Borntraeger <borntrae...@de.ibm.com> wrote: > >>>>> > >>>>>> On 27.08.19 14:56, Igor Mammedov wrote: > >>>>>>> On Tue, 20 Aug 2019 18:07:27 +0200 > >>>>>>> Cornelia Huck <coh...@redhat.com> wrote: > >>>>>>> > >>>>>>>> On Wed, 7 Aug 2019 11:32:41 -0400 > >>>>>>>> Igor Mammedov <imamm...@redhat.com> wrote: > >>>>>>>> > >>>>>>>>> Max memslot size supported by kvm on s390 is 8Tb, > >>>>>>>>> move logic of splitting RAM in chunks upto 8T to KVM code. > >>>>>>>>> > >>>>>>>>> This way it will hide KVM specific restrictions in KVM code > >>>>>>>>> and won't affect baord level design decisions. Which would allow > >>>>>>>>> us to avoid misusing memory_region_allocate_system_memory() API > >>>>>>>>> and eventually use a single hostmem backend for guest RAM. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Igor Mammedov <imamm...@redhat.com> > >>>>>>>>> --- > >>>>>>>>> v5: > >>>>>>>>> * move computation 'size -= slot_size' inside of loop body > >>>>>>>>> (David Hildenbrand <da...@redhat.com>) > >>>>>>>>> v4: > >>>>>>>>> * fix compilation issue > >>>>>>>>> (Christian Borntraeger <borntrae...@de.ibm.com>) > >>>>>>>>> * advance HVA along with GPA in kvm_set_phys_mem() > >>>>>>>>> (Christian Borntraeger <borntrae...@de.ibm.com>) > >>>>>>>>> > >>>>>>>>> patch prepares only KVM side for switching to single RAM memory > >>>>>>>>> region > >>>>>>>>> another patch will take care of dropping manual RAM partitioning in > >>>>>>>>> s390 code. > >>>>>>>> > >>>>>>>> I may have lost track a bit -- what is the status of this patch (and > >>>>>>>> the series)? > >>>>>>> > >>>>>>> Christian, > >>>>>>> > >>>>>>> could you test it on a host that have sufficient amount of RAM? > >>>>>>> > >>>>>> > >>>>>> > >>>>>> This version looks good. I was able to start a 9TB guest. > >>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, > >>>>>> guest_phys_addr=0, memory_size=8796091973632, > >>>>>> userspace_addr=0x3ffee700000}) = 0 > >>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, > >>>>>> guest_phys_addr=0x7fffff00000, memory_size=1099512676352, > >>>>>> userspace_addr=0xbffee600000}) = 0 > >>>> > >>>>>> The only question is if we want to fix the weird alignment > >>>>>> (0x7fffff00000) when > >>>>>> we already add a migration barrier for uber-large guests. > >>>>>> Maybe we could split at 4TB to avoid future problem with larger page > >>>>>> sizes? > >>>>> That probably should be a separate patch on top. > >>>> > >>>> Right. The split in KVM code is transparent to migration and other parts > >>>> of QEMU, correct? > >>> > >>> it should not affect other QEMU parts and migration (to my limited > >>> understanding of it), > >>> we are passing to KVM memory slots upto KVM_SLOT_MAX_BYTES as we were > >>> doing before by > >>> creating several memory regions instead of one as described in [2/2] > >>> commit message. > >>> > >>> Also could you also test migration of +9Tb guest, to check that nothing > >>> where broken by > >>> accident in QEMU migration code? > >> > >> I only have one server that is large enough :-/ > > Could you test offline migration on it (to a file and restore from it)? > > I tested migration with a hacked QEMU (basically split in KVM code at 1GB > instead of 8TB) and > the restore from file failed with data corruption in the guest. The current > code > does work when I use small memslots. No idea yet what is wrong. I've tested 2Gb (max, I can test) guest (also hacked up version) and it worked for me. How do you test it and detect corruption so I could try to reproduce it locally? (given it worked before, there is no much hope but I could try)