Ping?
On Tue, Mar 18, 2025 at 9:16 AM Alex Deucher <alexdeuc...@gmail.com> wrote: > > Ping? > > On Thu, Mar 6, 2025 at 10:54 AM Alex Deucher <alexander.deuc...@amd.com> > wrote: > > > > Describes what debugfs files are available and what > > they are used for. > > > > v2: fix some typos (Mark Glines) > > v3: Address comments from Siqueira and Kent > > > > Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> > > --- > > Documentation/gpu/amdgpu/debugfs.rst | 210 +++++++++++++++++++++++++ > > Documentation/gpu/amdgpu/debugging.rst | 7 + > > Documentation/gpu/amdgpu/index.rst | 1 + > > 3 files changed, 218 insertions(+) > > create mode 100644 Documentation/gpu/amdgpu/debugfs.rst > > > > diff --git a/Documentation/gpu/amdgpu/debugfs.rst > > b/Documentation/gpu/amdgpu/debugfs.rst > > new file mode 100644 > > index 0000000000000..fdfc1a8773c72 > > --- /dev/null > > +++ b/Documentation/gpu/amdgpu/debugfs.rst > > @@ -0,0 +1,210 @@ > > +============== > > +AMDGPU DebugFS > > +============== > > + > > +The amdgpu driver provides a number of debugfs files to aid in debugging > > +issues in the driver. Thse are usually found in > > +/sys/kernel/debug/dri/<num>. > > + > > +DebugFS Files > > +============= > > + > > +amdgpu_benchmark > > +---------------- > > + > > +Run benchmarks using the DMA engine the driver uses for GPU memory paging. > > +Write a number to the file to run the test. The results are written to the > > +kernel log. VRAM is on device memory (dGPUs) or cave out (APUs) and GTT > > +(Graphics Translation Tables) is system memory that is accessible by the > > GPU. > > +The following tests are available: > > + > > +- 1: simple test, VRAM to GTT and GTT to VRAM > > +- 2: simple test, VRAM to VRAM > > +- 3: GTT to VRAM, buffer size sweep, powers of 2 > > +- 4: VRAM to GTT, buffer size sweep, powers of 2 > > +- 5: VRAM to VRAM, buffer size sweep, powers of 2 > > +- 6: GTT to VRAM, buffer size sweep, common display sizes > > +- 7: VRAM to GTT, buffer size sweep, common display sizes > > +- 8: VRAM to VRAM, buffer size sweep, common display sizes > > + > > +amdgpu_test_ib > > +-------------- > > + > > +Read this file to run simple IB (Indirect Buffer) tests on all kernel > > managed > > +rings. IBs are command buffers usually generated by userspace applications > > +which are submitted to the kernel for execution on an particular GPU > > engine. > > +This just runs the simple IB tests included in the kernel. These tests > > +are engine specific and verify that IB submission works. > > + > > +amdgpu_discovery > > +---------------- > > + > > +Provides raw access to the IP discovery binary provided by the GPU. Read > > this > > +file to acess the raw binary. This is useful for verifying the contents of > > +the IP discovery table. It is chip specific. > > + > > +amdgpu_vbios > > +------------ > > + > > +Provides raw access to the ROM binary image from the GPU. Read this file > > to > > +access the raw binary. This is useful for verifying the contents of the > > +video BIOS ROM. It is board specific. > > + > > +amdgpu_evict_gtt > > +---------------- > > + > > +Evict all buffers from the GTT memory pool. Read this file to evict all > > +buffers from this pool. > > + > > +amdgpu_evict_vram > > +----------------- > > + > > +Evict all buffers from the VRAM memory pool. Read this file to evict all > > +buffers from this pool. > > + > > +amdgpu_gpu_recover > > +------------------ > > + > > +Trigger a GPU reset. Read this file to trigger reset the entire GPU. > > +All work currently running on the GPU will be lost. > > + > > +amdgpu_ring_<name> > > +------------------ > > + > > +Provides read access to the kernel managed ring buffers for each ring > > <name>. > > +These are useful for debugging problems on a particular ring. The ring > > buffer > > +is how the CPU sends commands to the GPU. The CPU writes commands into the > > +buffer and then asks the GPU engine to process it. This is the raw binary > > +contents of the ring buffer. Use a tool like UMR to decode the rings into > > human > > +readable form. > > + > > +amdgpu_mqd_<name> > > +----------------- > > + > > +Provides read access to the kernel managed MQD (Memory Queue Descriptor) > > for > > +ring <name> managed by the kernel driver. MQDs define the features of the > > ring > > +and are used to store the ring's state when it is not connected to > > hardware. > > +The driver writes the requested ring features and metadata (GPU addresses > > of > > +the ring itself and associated buffers) to the MQD and the firmware uses > > the MQD > > +to populate the hardware when the ring is mapped to a hardware slot. Only > > +available on engines which use MQDs. This provides access to the raw MQD > > +binary. > > + > > +amdgpu_error_<name> > > +------------------- > > + > > +Provides an interface to set an error code on the dma fences associated > > with > > +ring <name>. The error code specified is propogated to all fences > > associated > > +with the ring. Use this to inject a fence error into a ring. > > + > > +amdgpu_pm_info > > +-------------- > > + > > +Provides human readable information about the power management features > > +and state of the GPU. This includes current GFX clock, Memory clock, > > +voltages, average SoC power, temperature, GFX load, Memory load, SMU > > +feature mask, VCN power state, clock and power gating features. > > + > > +amdgpu_firmware_info > > +-------------------- > > + > > +Lists the firmware versions for all firmwares used by the GPU. Only > > +entries with a non-0 version are valid. If the version is 0, the firmware > > +is not valid for the GPU. > > + > > +amdgpu_fence_info > > +----------------- > > + > > +Shows the last signalled and emitted fence sequence numbers for each > > +kernel driver managed ring. Fences are associated with submissions > > +to the engine. Emitted fences have been submitted to the ring > > +and signalled fences have been signalled by the GPU. Rings with a > > +larger emitted fence value have outstanding work that is still being > > +processed by the engine that owns that ring. When the emitted and > > +signalled fence values are equal, the ring is idle. > > + > > +amdgpu_gem_info > > +--------------- > > + > > +Lists all of the PIDs using the GPU and the GPU buffers that they have > > +allocated. This lists the buffer size, pool (VRAM, GTT, etc.), and buffer > > +attributes (CPU access required, CPU cache attributes, etc.). > > + > > +amdgpu_vm_info > > +-------------- > > + > > +Lists all of the PIDs using the GPU and the GPU buffers that they have > > +allocated as well as the status of those buffers relative to that process' > > +GPU virtual address space (e.g., evicted, idle, invalidated, etc.). > > + > > +amdgpu_sa_info > > +-------------- > > + > > +Prints out all of the suballocations (sa) by the suballocation manager in > > the > > +kernel driver. Prints the GPU address, size, and fence info associated > > +with each suballocation. The suballocations are used internally within > > +the kernel driver for various things. > > + > > +amdgpu_<pool>_mm > > +---------------- > > + > > +Prints TTM information about the memory pool <pool>. > > + > > +amdgpu_vram > > +----------- > > + > > +Provides direct access to VRAM. Used by tools like UMR to inspect > > +objects in VRAM. > > + > > +amdgpu_iomem > > +------------ > > + > > +Provides direct access to GTT memory. Used by tools like UMR to inspect > > +GTT memory. > > + > > +amdgpu_regs_* > > +------------- > > + > > +Provides direct access to various register aperatures on the GPU. Used > > +by tools like UMR to access GPU registers. > > + > > +amdgpu_regs2 > > +------------ > > + > > +Provides an IOCTL interface used by UMR for interacting with GPU registers. > > + > > + > > +amdgpu_sensors > > +-------------- > > + > > +Provides an interface to query GPU power metrics (temperature, average > > +power, etc.). Used by tools like UMR to query GPU power metrics. > > + > > + > > +amdgpu_gca_config > > +----------------- > > + > > +Provides an interface to query GPU details (Graphics/Compute Array config, > > +PCI config, GPU family, etc.). Used by tools like UMR to query GPU > > details. > > + > > +amdgpu_wave > > +----------- > > + > > +Used to query GFX/compute wave infomation from the hardware. Used by tools > > +like UMR to query GFX/compute wave information. > > + > > +amdgpu_gpr > > +---------- > > + > > +Used to query GFX/compute GPR (General Purpose Register) infomation from > > the > > +hardware. Used by tools like UMR to query GPRs when debugging shaders. > > + > > +amdgpu_gprwave > > +-------------- > > + > > +Provides an IOCTL interface used by UMR for interacting with shader waves. > > + > > +amdgpu_fw_attestation > > +--------------------- > > + > > +Provides an interface for reading back firmware attestation records. > > diff --git a/Documentation/gpu/amdgpu/debugging.rst > > b/Documentation/gpu/amdgpu/debugging.rst > > index e75f97d0e4eaf..7cbfea0606e15 100644 > > --- a/Documentation/gpu/amdgpu/debugging.rst > > +++ b/Documentation/gpu/amdgpu/debugging.rst > > @@ -2,6 +2,13 @@ > > GPU Debugging > > =============== > > > > +General Debugging Options > > +========================= > > + > > +The DebugFS section provides documentation on a number files to aid in > > debugging > > +issues on the GPU. > > + > > + > > GPUVM Debugging > > =============== > > > > diff --git a/Documentation/gpu/amdgpu/index.rst > > b/Documentation/gpu/amdgpu/index.rst > > index 302d039928ee8..4c75567854cb2 100644 > > --- a/Documentation/gpu/amdgpu/index.rst > > +++ b/Documentation/gpu/amdgpu/index.rst > > @@ -16,5 +16,6 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) > > architectures. > > thermal > > driver-misc > > debugging > > + debugfs > > process-isolation > > amdgpu-glossary > > -- > > 2.48.1 > >