So we tricked used_memslots to be smaller than it actually has to be, because
we're ignoring the memslots filtered out by the vhost-user device.
Now, this is all far from relevant in practice as of now I think, and
usually would indicate user errors already (memory that's not shared with
vhost-user?).
well vhost-user device_add should fail if it can't get hands on all RAM
(if it doesn't we have a bug somewhere else)
There are some corner cases already. Like R/O NVDIMMs are completely
ignored -- I assume because we wouldn't be able to use them for virtio
queues either way, so it kind-of makes sense I guess.
I have not yet figured out *why* 988a27754bbb ("vhost: allow backends to
filter memory sections") was included at all. Why should we have
memory-backend-ram mapped into guest address space that vhost-user
cannot handle?
Take my NVDIMM example, if we'd use that NVDIMM memory in the guest as
ordinary RAM, it could eventually be used for virtio queues ... and we
don't even warn the user.
So I agree that hotplugging that device should fail. But it could also
break some existing setups (we could work around it using compat
machines I guess).
But we also have to fail hotplugging such a vhost-user device, ... and I
am not sure where to even put such checks.
It might gets more relevant when virtio-mem dynamically adds/removes memslots
and
relies on precise tracking of used vs. free memslots.
But maybe I should just ignore that case and live a happy life instead, it's
certainly hard to even trigger right now :)
Further, it will be helpful in memory device context in the near future
to know that a RAM memory region section will consume a memslot, and be
accounted for in the used vs. free memslots, such that we can implement
reservation of memslots for memory devices properly. Whether a device
filters this out and would theoretically still have a free memslot is
then hidden internally, making overall vhost memslot accounting easier.
Let's filter the memslots when creating the vhost memory array,
accounting all RAM && !ROM memory regions as "used_memslots" even if
vhost_user isn't interested in anonymous RAM regions, because it needs
an fd.
that would regress existing setups where it was possible
to start with N DIMMs and after this patch the same VM could fail to
start if N was close to vhost's limit in otherwise perfectly working
configuration. So this approach doesn't seem right.
As discussed already with MST, this was the state before that change. So
I strongly doubt that we would break anything because using
memory-backend-ram in that setup would already be suspicious.
Again, I did not figure out *why* that change was even included and
which setups even care about that.
Maybe Tiwei can comment.
Perhaps redoing vhost's used_memslots accounting would be
a better approach, right down to introducing reservations you'd
like to have eventually.
The question what to do with memory-backend-ram for vhost-user still
remains. It's independent of the "used_memslot" tracking, because used
vs. reserved would depend on the vhost-backend filtering demands ... and
I really wanted to avoid that with this commit. It just makes tracking
much harder.
Something like:
1: s/vhost_has_free_slot/vhost_memory_region_limit/
and maybe the same for kvm_has_free_slot
then rewrite memory_device_check_addable() moving all
used_memslots accounting into memory_device core.
I do something similar already, however, by querying the free memslots
from kvm and vhost, not the limits (limits are not very expressive).
Thanks!
--
Thanks,
David / dhildenb