On Fri, May 30, 2025 at 05:57:44PM -0400, Lyude Paul wrote: > On Wed, 2025-05-21 at 15:45 +0900, Alexandre Courbot wrote: > > Reserve a page of system memory so sysmembar can perform a read on it if > > a system write occurred since the last flush. Do this early as it can be > > required to e.g. reset the GPU falcons. > > > > Signed-off-by: Alexandre Courbot <acour...@nvidia.com> > > --- > > drivers/gpu/nova-core/gpu.rs | 45 > > +++++++++++++++++++++++++++++++++++++++++-- > > drivers/gpu/nova-core/regs.rs | 10 ++++++++++ > > 2 files changed, 53 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs > > index > > 50417f608dc7b445958ae43444a13c7593204fcf..a4e2cf1b529cc25fc168f68f9eaa6f4a7a9748eb > > 100644 > > --- a/drivers/gpu/nova-core/gpu.rs > > +++ b/drivers/gpu/nova-core/gpu.rs > > @@ -2,6 +2,7 @@ > > > > use kernel::{device, devres::Devres, error::code::*, pci, prelude::*}; > > > > +use crate::dma::DmaObject; > > use crate::driver::Bar0; > > use crate::firmware::{Firmware, FIRMWARE_VERSION}; > > use crate::gfw; > > @@ -158,12 +159,32 @@ fn new(bar: &Bar0) -> Result<Spec> { > > } > > > > /// Structure holding the resources required to operate the GPU. > > -#[pin_data] > > +#[pin_data(PinnedDrop)] > > pub(crate) struct Gpu { > > spec: Spec, > > /// MMIO mapping of PCI BAR 0 > > bar: Devres<Bar0>, > > fw: Firmware, > > + /// System memory page required for flushing all pending GPU-side > > memory writes done through > > + /// PCIE into system memory. > > + sysmem_flush: DmaObject, > > +} > > + > > +#[pinned_drop] > > +impl PinnedDrop for Gpu { > > + fn drop(self: Pin<&mut Self>) { > > + // Unregister the sysmem flush page before we release it. > > + let _ = self.bar.try_access_with(|b| { > > + regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::default() > > + .set_adr_39_08(0) > > + .write(b); > > + if self.spec.chipset >= Chipset::GA102 { > > + regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::default() > > + .set_adr_63_40(0) > > + .write(b); > > + } > > + }); > > + }
Sorry that I haven't noticed this before -- I think this should be self contained in a new type (e.g. SysmemFlush). We should also move this kind of cleanup into the Driver::remove() callback, where we still have a bound device, to avoid try_access_with(). I already have this on my list to implement for quite a while, because I wasn't quite sure yet what's the best way to approach this, but I think the simple remove() callback to perform tear down operations on device resources is fine. I'll prepare the corresponding patches and subsequently rework those bits accordingly. > > } > > > > impl Gpu { > > @@ -187,10 +208,30 @@ pub(crate) fn new( > > gfw::wait_gfw_boot_completion(bar) > > .inspect_err(|_| dev_err!(pdev.as_ref(), "GFW boot did not > > complete"))?; > > > > + // System memory page required for sysmembar to properly flush > > into system memory. > > + let sysmem_flush = { > > + let page = DmaObject::new(pdev.as_ref(), > > kernel::bindings::PAGE_SIZE)?; > > + > > + // Register the sysmem flush page. > > + let handle = page.dma_handle(); > > + > > + regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::default() > > + .set_adr_39_08((handle >> 8) as u32) > > + .write(bar); > > + if spec.chipset >= Chipset::GA102 { > > + regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::default() > > + .set_adr_63_40((handle >> 40) as u32) > > + .write(bar); > > + } > > + > > Small nit - would it make sense for us to just add a function for initiating a > sysmem memory flush that you could pass the bar to? Seems like it might be a > bit less error prone if we end up having to do this elsewhere Agreed -- but let's solve this with a new type and make it a method instead.