[virtio-dev] Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES

Michael S. Tsirkin Thu, 26 Oct 2023 03:30:34 -0700

On Thu, Oct 26, 2023 at 10:24:26AM +0000, Chen, Jiqian wrote:
> 
> On 2023/10/25 11:51, Parav Pandit wrote:
> > 
> > 
> >> From: Chen, Jiqian <[email protected]>
> >> Sent: Tuesday, October 24, 2023 5:44 PM
> >>
> >> On 2023/10/24 18:51, Parav Pandit wrote:
> >>>
> >>>> From: Chen, Jiqian <[email protected]>
> >>>> Sent: Tuesday, October 24, 2023 4:06 PM
> >>>>
> >>>> On 2023/10/23 21:35, Parav Pandit wrote:
> >>>>>
> >>>>>> From: Chen, Jiqian <[email protected]>
> >>>>>> Sent: Monday, October 23, 2023 4:09 PM
> >>>>>
> >>>>>>> I think this can be done without introducing the new register.
> >>>>>>> Can you please check if the PM register itself can serve the
> >>>>>>> purpose instead
> >>>>>> of new virtio level register?
> >>>>>> Do you mean the system PM register?
> >>>>> No, the device's PM register at transport level.
> >>>> I tried to find this register(pci level or virtio pci level or virtio
> >>>> driver level), but I didn't find it in Linux kernel or Qemu codes.
> >>>> May I know which register you are referring to specifically? Or which
> >>>> PM state bit you mentioned below?
> >>>>
> >>> PCI spec's PCI Power Management Capability Structure in section 7.5.2.
> >> Yes, what you point to is PM capability for PCIe device.
> >> But the problem is still that in Qemu code, it will check the
> >> condition(pci_bus_is_express or pci_is_express) of all virtio-pci devices 
> >> in
> >> function virtio_pci_realize(), if the virtio devices aren't a PCIe device, 
> >> it will not
> >> add PM capability for them.
> > PCI PM capability is must for PCIe devices. So may be QEMU code has put it 
> > only under is_pcie check.
> > But it can be done outside of that check as well because this capability 
> > exists on PCI too for long time and it is backward compatible.
> Do you suggest me to implement PM capability for virtio devices in
> Qemu firstly, and then to try if the PM capability can work for this
> scenario?


virtio devices in qemu already have a PM capability.


> If so, we will complicate a simple problem. Because there are no other needs 
> to add PM capability for virtio devices for now, if we add it just for 
> preserving resources, it seems unnecessary and unreasonable. And we are not 
> sure if there are other scenarios that are not during the process of PM state 
> changing also need to preserve resources, if have, then the PM register can't 
> cover, but preserve_resources register can.

One of the selling points of virtio is precisely reusing existing
platform capabilities as opposed to coming up with our own.
See abstract.tex


> Can I add some notes like "If PM capability is implemented for virtio 
> devices, it may cover this scenario, and if there are no other scenarios that 
> PM can't cover, then we can remove this register " in commit message or spec 
> description and let us continue to add preserve_resources register?

We can't remove registers.

> > 
> >> And another problem is how about the MMIO transport devices? Since
> >> preserve resources is need for all transport type devices.
> >>
> > MMIO lacks such rich PM definitions. If in future MMIO wants to support, it 
> > will be extended to match to other transports like PCI.
> > 
> >>>>>
> >>>>>> I think it is unreasonable to let virtio- device listen the PM
> >>>>>> state of Guest system.
> >>>>> Guest driver performs any work on the guest systems PM callback
> >>>>> events in
> >>>> the virtio driver.
> >>>> I didn't find any PM state callback in the virtio driver.
> >>>>
> >>> There are virtio_suspend and virtio_resume in case of Linux.
> >> I think what you said virtio_suspend/resume is freeze/restore callback from
> >> "struct virtio_driver" or suspend/resume callback from "static const struct
> >> dev_pm_ops virtio_pci_pm_ops".
> >> And yes, I agree, if virtio devices have PM capability, maybe we can set 
> >> PM state
> >> in those callback functions.
> >>
> >>>
> >>>>>
> >>>>>> It's more suitable that each device gets notifications from driver,
> >>>>>> and then do preserving resources operation.
> >>>>> I agree that each device gets the notification from driver.
> >>>>> The question is, should it be virtio driver, or existing pci driver
> >>>>> which
> >>>> transitions the state from d0->d3 and d3->d0 is enough.
> >>>> It seems there isn't existing pci driver to transitions d0 or d3
> >>>> state. Could you please tell me which one it is specifically? I am very 
> >>>> willing to
> >> give a try.
> >>>>
> >>> Virtio-pci modern driver of Linux should be able to.
> >> Yes, I know, it is the VIRTIO_PCI_FLAG_INIT_PM_BIT. But still the two 
> >> problems I
> >> said above.
> >>
> > Both can be resolved without switching to pcie.
> >  
> >>>
> >>>>> Can you please check that?
> >>>>>
> >>>>>>>> --- a/transport-pci.tex
> >>>>>>>> +++ b/transport-pci.tex
> >>>>>>>> @@ -325,6 +325,7 @@ \subsubsection{Common configuration
> >> structure
> >>>>>>>> layout}\label{sec:Virtio Transport
> >>>>>>>>          /* About the administration virtqueue. */
> >>>>>>>>          le16 admin_queue_index;         /* read-only for driver */
> >>>>>>>>          le16 admin_queue_num;         /* read-only for driver */
> >>>>>>>> +        le16 preserve_resources;        /* read-write */
> >>>>>>> Preserving these resources in the device implementation takes
> >>>>>>> finite amount
> >>>>>> of time.
> >>>>>>> Possibly more than 40nsec (time of PCIe write TLP).
> >>>>>>> Hence this register must be a polling register to indicate that
> >>>>>> preservation_done.
> >>>>>>> This will tell the guest when the preservation is done, and when
> >>>>>>> restoration is
> >>>>>> done, so that it can resume upper layers.
> >>>>>>>
> >>>>>>> Please refer to queue_reset definition to learn more about such
> >>>>>>> register
> >>>>>> definition.
> >>>>>> Thanks, I will refer to "queue_reset". So, I need three values,
> >>>>>> driver write 1 to let device do preserving resources, driver write
> >>>>>> 2 to let device do restoring resources, device write 0 to tell
> >>>>>> driver that preserving or restoring done, am I right?
> >>>>>>
> >>>>> Right.
> >>>>>
> >>>>> And if the existing pcie pm state bits can do, we can leverage that.
> >>>>> If it cannot be used, lets add that reasoning in the commit log to
> >>>>> describe this
> >>>> register.
> >>>>>
> >>>>>>>
> >>>>>>> Lets please make sure that PCIe PM level registers are
> >>>>>>> sufficient/not-sufficient
> >>>>>> to decide the addition of this register.
> >>>>>> But if the device is not a PCIe device, it doesn't have PM
> >>>>>> capability, then this will not work. Actually in my local
> >>>>>> environment, pci_is_express() return false in Qemu, they are not
> >>>>>> PCIe
> >>>> device.
> >>>>> It is reasonable to ask to plug in as PCIe device in 2023 to get new
> >>>>> functionality that too you mentioned a gpu device. 😊
> >>>>> Which does not have very long history of any backward compatibility.
> >>>> Do you suggest me to add PM capability for virtio-gpu or change
> >>>> virtio-gpu to a PCIe device?
> >>>>
> >>> PCI Power Management Capability Structure does not seem to be limited to
> >> PCIe.
> >> I am not sure, but in current Qemu code, I can see the check 
> >> "pci_is_express"
> >> for all virtio-pci devices. If we want to add PM capability for virtio-pci 
> >> devices,
> >> we need to change them to PCIe device I think.
> >>
> > That is one option.
> > Second option to extend PCI PM cap for non pci device because it is 
> > supported.
> > 
> >>>
> >>
> >> --
> >> Best regards,
> >> Jiqian Chen.
> 
> -- 
> Best regards,
> Jiqian Chen.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[virtio-dev] Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES

Reply via email to