On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote: > On 21/03/18 11:09AM, Leon Romanovsky wrote: > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote: > > > On Wed, 17 Mar 2021 15:58:40 +0200 > > > Leon Romanovsky <l...@kernel.org> wrote: > > > > > > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote: > > > > > On 21/03/17 01:47PM, Leon Romanovsky wrote: > > > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote: > > > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote: > > > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote: > > > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote: > > > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz > > > > > > > > > > wrote: > > > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson > > > > > > > > > > > wrote: > > > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530 > > > > > > > > > > > > Amey Narkhede <ameynarkhed...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote: > > > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex > > > > > > > > > > > > > > Williamson wrote: > > > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100 > > > > > > > > > > > > > > > Pali Rohár <p...@kernel.org> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and > > > > > > > > > > > > > > > > > secondary bus > > > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are > > > > > > > > > > > > > > > > > hot reset and > > > > > > > > > > > > > > > > > warm reset respectively. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. > > > > > > > > > > > > > > > > Slot reset is just another > > > > > > > > > > > > > > > > type of reset, which is currently implemented > > > > > > > > > > > > > > > > only for PCIe hot plug > > > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it > > > > > > > > > > > > > > > > just call PCI secondary > > > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset > > > > > > > > > > > > > > > > does not have API in > > > > > > > > > > > > > > > > kernel and therefore drivers do not export this > > > > > > > > > > > > > > > > type of reset via any > > > > > > > > > > > > > > > > kernel function (yet). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Warm reset is beyond the scope of this series, > > > > > > > > > > > > > > > but could be implemented > > > > > > > > > > > > > > > in a compatible way to fit within the > > > > > > > > > > > > > > > pci_reset_fn_methods[] array > > > > > > > > > > > > > > > defined here. Note that with this series the > > > > > > > > > > > > > > > resets available through > > > > > > > > > > > > > > > pci_reset_function() and the per device reset > > > > > > > > > > > > > > > attribute is sysfs remain > > > > > > > > > > > > > > > exactly the same as they are currently. The bus > > > > > > > > > > > > > > > and slot reset > > > > > > > > > > > > > > > methods used here are limited to devices where > > > > > > > > > > > > > > > only a single function is > > > > > > > > > > > > > > > affected by the reset, therefore it is not like > > > > > > > > > > > > > > > the patch you proposed > > > > > > > > > > > > > > > which performed a reset irrespective of the > > > > > > > > > > > > > > > downstream devices. This > > > > > > > > > > > > > > > series only enables selection of the existing > > > > > > > > > > > > > > > methods. Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Alex, > > > > > > > > > > > > > > > > > > > > > > > > > > > > I asked the patch author here [1], but didn't get > > > > > > > > > > > > > > any response, maybe > > > > > > > > > > > > > > you can answer me. What is the use case scenario > > > > > > > > > > > > > > for this functionality? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/ > > > > > > > > > > > > > > > > > > > > > > > > > > > Sorry for not responding immediately. There were some > > > > > > > > > > > > > buggy wifi cards > > > > > > > > > > > > > which needed FLR explicitly not sure if that behavior > > > > > > > > > > > > > is fixed in > > > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the > > > > > > > > > > > > > engineer who > > > > > > > > > > > > > is involved is on PTO that is why I did not respond > > > > > > > > > > > > > immediately as > > > > > > > > > > > > > I don't know the details yet. > > > > > > > > > > > > > > > > > > > > > > > > And more generally, devices continue to have reset > > > > > > > > > > > > issues and we > > > > > > > > > > > > impose a fixed priority in our ordering. We can and > > > > > > > > > > > > probably should > > > > > > > > > > > > continue to quirk devices when we find broken resets so > > > > > > > > > > > > that we have > > > > > > > > > > > > the best default behavior, but it's currently not easy > > > > > > > > > > > > for an end user > > > > > > > > > > > > to experiment, ie. this reset works, that one doesn't. > > > > > > > > > > > > We might also > > > > > > > > > > > > have platform issues where a given reset works better > > > > > > > > > > > > on a certain > > > > > > > > > > > > platform. Exposing a way to test these things might > > > > > > > > > > > > lead to better > > > > > > > > > > > > quirks. In the case I think Pali was looking for, they > > > > > > > > > > > > wanted a > > > > > > > > > > > > mechanism to force a bus reset, if this was in > > > > > > > > > > > > reference to a single > > > > > > > > > > > > function device, this could be accomplished by setting > > > > > > > > > > > > a priority for > > > > > > > > > > > > that mechanism, which would translate to not only the > > > > > > > > > > > > sysfs reset > > > > > > > > > > > > attribute, but also the reset mechanism used by > > > > > > > > > > > > vfio-pci. Thanks, > > > > > > > > > > > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To confirm from our end - we have seen many such > > > > > > > > > > > instances where default > > > > > > > > > > > reset methods have not worked well on our platform. > > > > > > > > > > > Debugging these > > > > > > > > > > > issues is painful in practice, and this interface would > > > > > > > > > > > make it far > > > > > > > > > > > easier. > > > > > > > > > > > > > > > > > > > > > > Having an interface like this would also help us better > > > > > > > > > > > communicate the > > > > > > > > > > > issues we find with upstream. Allowing others to more > > > > > > > > > > > easily test our > > > > > > > > > > > (or other entities') findings should give better > > > > > > > > > > > visibility into > > > > > > > > > > > which issues apply to the device in general and which are > > > > > > > > > > > platform > > > > > > > > > > > specific. In disambiguating the former from the latter, > > > > > > > > > > > we should be > > > > > > > > > > > able to better quirk devices for everyone, and in the > > > > > > > > > > > latter cases, this > > > > > > > > > > > interface allows for a safer and more elegant solution > > > > > > > > > > > than any of the > > > > > > > > > > > current alternatives. > > > > > > > > > > > > > > > > > > > > So to summarize, we are talking about test and debug > > > > > > > > > > interface to > > > > > > > > > > overcome HW bugs, am I right? > > > > > > > > > > > > > > > > > > > > My personal experience shows that once the easy workaround > > > > > > > > > > exists > > > > > > > > > > (and write to generally available sysfs is very simple), > > > > > > > > > > the vendors > > > > > > > > > > and users desire for proper fix decreases drastically. > > > > > > > > > > IMHO, we will > > > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce > > > > > > > > > > in quirks. > > > > > > > > > > > > > > > > > > > > My 2-cents. > > > > > > > > > > > > > > > > > > > I agree with your point but at least it gives the userspace > > > > > > > > > ability > > > > > > > > > to use broken device until bug is fixed in upstream. > > > > > > > > > > > > > > > > As I said, I don't expect many fixes once "userspace" will be > > > > > > > > able to > > > > > > > > use cheap workaround. There is no incentive to fix it. > > > > > > We can increase the annoyance factor of using a modified set of reset > > > methods, but ultimately we can only control what goes into our kernel, > > > other kernels might take v1 of this series and incorporate it > > > regardless of what happens here. > > > > > > > > > > > > This is also applicable for obscure devices without upstream > > > > > > > > > drivers for example custom FPGA based devices. > > > > > > > > > > > > > > > > This is not relevant to upstream kernel. Those vendors ship > > > > > > > > everything > > > > > > > > custom, they don't need upstream, we don't need them :) > > > > > > > > > > > > > > > By custom I meant hobbyists who could tinker with their custom > > > > > > > FPGA. > > > > > > > > > > > > I invite such hobbyists to send patches and include their FPGA in > > > > > > upstream kernel. > > > > > > This is potentially another good use case, how receptive are we going > > > to be to an FPGA design that botches a reset. Do they have a valid > > > device ID for us to base a quirk on, are they just squatting on one, or > > > using the default from a library. Maybe the next bitstream will > > > resolve it, maybe without any external indication. IOW, what would the > > > quality level be for that quirk versus using this as a workaround, > > > where the user probably wouldn't mind a kernel nag? > > > > It is worth to solve it when the need arises. > > > > > > > > > > > > > > Another main application which I forgot to mention is > > > > > > > > > virtualization > > > > > > > > > where vmm wants to reset the device when the guest is reset, > > > > > > > > > to emulate machine reboot as closely as possible. > > > > > > > > > > > > > > > > It can work in very narrow case, because reset will cause to > > > > > > > > device > > > > > > > > reprobe and most likely the driver will be different from the > > > > > > > > one that > > > > > > > > started reset. I can imagine that net devices will lose their > > > > > > > > state and > > > > > > > > config after such reset too. > > > > > > > > > > > > > > > Not sure if I got that 100% right. The pci_reset_function() > > > > > > > function > > > > > > > saves and restores device state over the reset. > > > > > > > > > > > > I'm talking about netdev state, but whatever given the existence of > > > > > > sysfs reset knob. > > > > > > > > > > > > > > > > > > > > > IMHO, it will be saner for everyone if virtualization don't try > > > > > > > > such resets. > > > > > > That would cause a massive regression in device assignment support. As > > > with other sysfs attributes, triggering them alongside a running driver > > > is probably not going to end well. However, pci_reset_function() is > > > extremely useful for stopping devices and returning them to a default > > > state, when either rebooting a VM or returning the device to the host. > > > The device is not removed and re-probed when this occurs, vfio-pci is > > > able to hold onto the device across these actions. Sure, don't reset a > > > netdev device when it's in use, that's not what these are used for. > > > > > > > > > > The exists reset sysfs attribute was added for exactly this case > > > > > > > though. > > > > > > > > > > > > I didn't know the rationale behind that file till you said and I > > > > > > googled libvirt discussion, so ok. Do you propose that libvirt > > > > > > will manage database of devices and their working reset types? > > > > > > > > > > > I don't have much idea about internals of libvirt but why would > > > > > it need to manage database of working reset types? It could just > > > > > read new reset_methods attribute to get the list of supported reset > > > > > methods. > > > > > > > > Because the idea of this patch is to read all supported reset types and > > > > allow to the user to chose the working one. The user will do it with > > > > help from StackOverflow, but libvirt will need to have some sort of > > > > database, otherwise it won't be different from simple "echo 1 > reset" > > > > which will iterate over all supported resets anyway. > > > > > > AFAIK, libvirt no longer attempts to do resets itself, or is at least > > > moving in that direction. vfio-pci will reset as device when they're > > > opened by a user (when available) or triggered via the API. > > > > <...> > > > > > > The difference here is that this is a workaround to solve bugs that > > > > should be fixed in the kernel. > > > > > > If we want to discourage using this as a primary means to resolve reset > > > issues on a device then we can create log warnings any time it's used. > > > Downstreams that really want this functionality are going to take this > > > patch from the list whether we accept it or not. As above, it seems > > > there are valid use cases. Even with mainstream vfio in QEMU, I go > > > through some hoops trying to determine if I can do a secondary bus > > > reset rather than a PM reset because it's not specified anywhere what a > > > "soft reset" means for any given device. This sort of interface could > > > make it easier to apply a system policy that a pci_reset_function() > > > should always perform a secondary bus reset if the only other option is > > > a PM reset. Maybe that policy mostly makes sense for a VM use case, so > > > we'd want one policy by default and another when the device is used for > > > this functionality. How could we accomplish that with a quirk? Thanks, > > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel > > API? > > > > If it is latter then we don't really need sysfs, if not, we still need > > some sort of DB to create second policy, because "supported != working". > > What am I missing? > > > > Thanks > > > Can you explain bit more about why supported != working?
It is written in the commit message of this patch. https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhed...@gmail.com/ "This feature aims to allow greater control of a device for use cases as device assignment, where specific device or platform issues may interact poorly with a given reset method, and for which device specific quirks have not been developed." You wrote it and also repeated it a couple of times during the discussion. If device can understand that specific reset doesn't work, it won't perform it in first place. Thanks