[Public] > -----Original Message----- > From: amd-gfx <[email protected]> On Behalf Of Alex > Deucher > Sent: Tuesday, March 4, 2025 11:50 AM > To: [email protected] > Cc: Deucher, Alexander <[email protected]> > Subject: [PATCH] drm/amdgpu: add initial documentation for debugfs files > > Describes what debugfs files are available and what > they are used for. > > v2: fix some typos (Mark Glines) > > Signed-off-by: Alex Deucher <[email protected]> > --- > Documentation/gpu/amdgpu/debugfs.rst | 202 +++++++++++++++++++++++++++ > Documentation/gpu/amdgpu/index.rst | 1 + > 2 files changed, 203 insertions(+) > create mode 100644 Documentation/gpu/amdgpu/debugfs.rst > > diff --git a/Documentation/gpu/amdgpu/debugfs.rst > b/Documentation/gpu/amdgpu/debugfs.rst > new file mode 100644 > index 0000000000000..18bccb57c89fb > --- /dev/null > +++ b/Documentation/gpu/amdgpu/debugfs.rst > @@ -0,0 +1,202 @@ > +============== > +AMDGPU DebugFS > +============== > + > +The amdgpu driver provides a number of debugfs files to aid in debugging > +issues in the driver. Thse are usually found in > +/sys/kernel/debug/dri/<num>. > + > +DebugFS Files > +============= > + > +amdgpu_benchmark > +---------------- > + > +Run benchmarks using the DMA engine the driver uses for GPU memory paging. > +Write a number to the file to run the test. The results are written to the > +kernel log. The following tests are available: > + > +- 1: simple test, VRAM to GTT and GTT to VRAM > +- 2: simple test, VRAM to VRAM > +- 3: GTT to VRAM, buffer size sweep, powers of 2 > +- 4: VRAM to GTT, buffer size sweep, powers of 2 > +- 5: VRAM to VRAM, buffer size sweep, powers of 2 > +- 6: GTT to VRAM, buffer size sweep, common modes > +- 7: VRAM to GTT, buffer size sweep, common modes > +- 8: VRAM to VRAM, buffer size sweep, common modes > + > +amdgpu_test_ib > +-------------- > + > +Read this file to run simple IB (Indirect Buffer) tests on all kernel managed > +rings. IBs are command buffers usually generated by userspace applications > +which are submitted to the kernel for execution on an particular GPU engine. > +This just runs the simple IB tests included in the kernel. > + > +amdgpu_discovery > +---------------- > + > +Provides raw access to the IP discovery binary provided by the GPU. Read > this > +file to acess the raw binary. > + > +amdgpu_vbios > +------------ > + > +Provides raw access to the ROM binary image from the GPU. Read this file to > +access the raw binary. > + > +amdgpu_evict_gtt > +---------------- > + > +Evict all buffers from the GTT memory pool. Read this file to evict all > +buffers from this pool. > + > +amdgpu_evict_vram > +----------------- > + > +Evict all buffers from the VRAM memory pool. Read this file to evict all > +buffers from this pool. > + > +amdgpu_gpu_recover > +------------------ > +
If we're going for consistency, then you could add "Trigger a full GPU reset" or something like that beforehand. The other entries above are "Do a thing. Read this file to do the thing", so it doesn't match the same style. But it's honestly so nit-picky and pedantic that it's not a big deal. > +Read this file to trigger a full GPU reset. All work currently running > +on the GPU will be lost. > + > +amdgpu_ring_<name> > +------------------ > + > +Provides read access to the kernel managed ring buffers for each ring <name>. > +These are useful for debugging problems on a particular ring. The ring > buffer > +is how the CPU sends commands to the GPU. The CPU writes commands into the > +buffer and then asks the GPU engine to process it. > + > +amdgpu_mqd_<name> > +----------------- > + > +Provides read access to the kernel managed MQD (Memory Queue Descriptor) for > +ring <name> managed by the kernel driver. MQDs define the features of the > ring > +and are used to store the ring's state when it is not connected to hardware. > +The driver writes the requested ring features and metadata (GPU addresses of > +the ring itself and associated buffers) to the MQD and the firmware uses the > MQD > +to populate the hardware when the ring is mapped to a hardware slot. Only > +available on engines which use MQDs. > + > +amdgpu_error_<name> > +------------------- > + > +Provides an interface to set an error on fences associated with ring <name>. > +The error code specified is propogated to all fences associated with the > +ring. > + > +amdgpu_pm_info > +-------------- > + > +Provides human readable information about the power management features > +and state of the GPU. This includes current GFX clock, Memory clock, > +voltages, average SoC power, temperature, GFX load, Memory load, SMU > +feature mask, VCN power state, clock and power gating features. > + > +amdgpu_firmware_info > +-------------------- > + > +Lists the firmware versions for all firmwares used by the GPU. Only > +entries with a non-0 version are valid. If the version is 0, the firmware > +is not valid for the GPU. > + > +amdgpu_fence_info > +----------------- > + > +Shows the last signalled and emitted fence sequence numbers for each > +kernel driver managed ring. Fences are associated with submissions > +to the engine. Emitted fences have been submitted to the ring > +and signalled fences have been signalled by the GPU. Rings with a > +larger emitted fence value have outstanding work that is still being > +processed by the engine that owns that ring. When the emitted and > +signalled fence values are equal, the ring is idle. > + > +amdgpu_gem_info > +--------------- > + > +Lists all of the PIDs using the GPU and the GPU buffers that they have > +allocated. This lists the buffer size, pool (VRAM, GTT, etc.), and buffer > +attributes (CPU access required, CPU cache attributes, etc.). > + > +amdgpu_vm_info > +-------------- > + > +Lists all of the PIDs using the GPU and the GPU buffers that they have > +allocated as well as the status of those buffers relative to that process' > +GPU virtual address space (e.g., evicted, idle, invalidated, etc.). > + > +amdgpu_sa_info > +-------------- > + > +Prints out all of the suballocations by the suballocation manager in the > +kernel driver. Prints the GPU address, size, and fence info associated > +with each suballocation. The suballocations are used internally within > +the kernel driver for various things. > + > +amdgpu_<pool>_mm > +---------------- > + > +Prints TTM information about the memory pool <pool>. > + > +amdgpu_vram > +----------- > + > +Provides direct access to VRAM. Used by tools like UMR to inspect > +objects in VRAM. > + > +amdgpu_iomem > +------------ > + > +Provides direct access to GTT memory. Used by tools like UMR to inspect > +GTT memory. > + > +amdgpu_regs_* > +------------- > + > +Provides direct access to various register aperatures on the GPU. Used > +by tools like UMR to access GPU registers. > + > +amdgpu_regs2 > +------------ > + > +Provides an IOCTL interface used by UMR for interacting with GPU registers. > + > + > +amdgpu_sensors > +-------------- > + > +Provides an interface to query GPU power metrics (temperature, average > +power, etc.). Used by tools like UMR to query GPU power metrics. > + > + > +amdgpu_gca_config > +----------------- > + > +Provides an interface to query GPU details (GFX config, PCI config, > +GPU family, etc.). Used by tools like UMR to query GPU details. > + > +amdgpu_wave > +----------- > + > +Used to query GFX/compute wave infomation from the hardware. Used by tools > +like UMR to query GFX/compute wave information. > + > +amdgpu_gpr > +---------- > + > +Used to query GFX/compute GPR (General Purpose Register) information Weird extra spaces here Kent > from the > +hardware. Used by tools like UMR to query GPRs when debugging shaders. > + > +amdgpu_gprwave > +-------------- > + > +Provides an IOCTL interface used by UMR for interacting with shader waves. > + > +amdgpu_fw_attestation > +--------------------- > + > +Provides an interface for reading back firmware attestation records. > diff --git a/Documentation/gpu/amdgpu/index.rst > b/Documentation/gpu/amdgpu/index.rst > index 302d039928ee8..5254f3a162f84 100644 > --- a/Documentation/gpu/amdgpu/index.rst > +++ b/Documentation/gpu/amdgpu/index.rst > @@ -17,4 +17,5 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA > (CDNA) architectures. > driver-misc > debugging > process-isolation > + debugfs > amdgpu-glossary > -- > 2.48.1
