[PATCH] drm/amd/pm: Use correct macros for smu caps

2025-01-17 Thread Lijo Lazar
Fix the initialization and usage of capability values and mask. SMU_CAPS_MASK indicates mask value, and SMU_CAPS represent the capability value. Signed-off-by: Lijo Lazar Fixes: 9bb53d2ce109 ("drm/amd/pm: Add capability flags for SMU v13.0.6") --- .../drm/amd/pm/swsmu/smu13/smu_v13

[PATCH] drm/amd/pm: Add capability flags for SMU v13.0.6

2025-01-16 Thread Lijo Lazar
Add capability flags for SMU v13.0.6 variants. Initialize the flags based on firmware support. As there are multiple IP versions maintained, it is more manageable with one time initialization caps flags based on IP version and firmware feature support. Signed-off-by: Lijo Lazar --- drivers/gpu

[PATCH 2/3] drm/amdgpu: Check RRMT status for VCN v4.0.3

2025-01-10 Thread Lijo Lazar
RRMT could get dynamically enabled/disabled by PSP firmware. Read the status from register for reading RRMT status. For VFs, this is not accessible, hence assume that it's always disabled for now. Signed-off-by: Lijo Lazar Reviewed-by: Sathishkumar S --- drivers/gpu/drm/amd/amdgpu/amdgpu_

[PATCH 3/3] drm/amdgpu: Check RRMT status for JPEG v4.0.3

2025-01-10 Thread Lijo Lazar
RRMT could get dynamically enabled/disabled by PSP firmware. Read the status from register for reading RRMT status. For VFs, this is not accessible, hence assume that it's always disabled for now. Signed-off-by: Lijo Lazar Reviewed-by: Sathishkumar S --- drivers/gpu/drm/amd/a

[PATCH 1/3] drm/amdgpu: Add VCN v4.0.3 RRMT register offset

2025-01-10 Thread Lijo Lazar
Add RRMT control register offset for VCN v4.0.3 Signed-off-by: Lijo Lazar Reviewed-by: Sathishkumar S --- drivers/gpu/drm/amd/include/asic_reg/vcn/vcn_4_0_3_offset.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/include/asic_reg/vcn

[PATCH] drm/amdgpu: Add handler for SDMA context empty

2025-01-01 Thread Lijo Lazar
Context empty interrupt is enabled for SDMA 4.4.2. Add a handler for context empty interrupt so that it is disposed of fast, and not propagated to KFD layer. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 + drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 22

[PATCH] drm/amdgpu: Refine ip detection log message

2024-12-16 Thread Lijo Lazar
'add ip block' causes a confusion if the blocks are disabled later with ip_block_mask. Instead change to 'detected' and also add device context. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deleti

[PATCH] drm/amdgpu: Use dbg level for VBIOS check messages

2024-12-11 Thread Lijo Lazar
Driver has different ways to fetch VBIOS. If one of the methods doesn't find an authentic one, it will show misleading info messages eventhough a subsequent method finds a valid VBIOS. Keep the message level at debug and add device context. Signed-off-by: Lijo Lazar --- drivers/gpu/dr

[PATCH] drm/amdgpu: Avoid VF for RAS recovery source check

2024-12-09 Thread Lijo Lazar
VF device sets the RAS flag when mailbox data can't be read properly. There is no conclusive way to tell if the real source is RAS error. Therefore VF schedules a KFD based reset which doesn't set RAS source. SKip checking RAS source for any VF scheduled recovery. Signed-off-by:

[PATCH] drm/amd/pm: Revert state if force level fails

2024-12-06 Thread Lijo Lazar
-off-by: Lijo Lazar --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 58 + 1 file changed, 35 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c index 4d90e3f0bd17..6a9e26905edf 100644 --- a/drivers/gpu/drm/amd

[PATCH] drm/amdgpu: Increase FRU File Id buffer size

2024-12-03 Thread Lijo Lazar
Some boards use longer File Ids. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h index bc58dca18035

[PATCH] drm/amdgpu: Simplify cleanup check for FRU sysfs

2024-11-28 Thread Lijo Lazar
FRU info is expected to be non-NULL if FRU sys files are created. Simplify the check. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c b/drivers/gpu

[PATCH] drm/amdgpu: Remove gfxoff usage

2024-11-26 Thread Lijo Lazar
GFXOFF is not valid for these IP versions. Also, SDMA v4.4.2 is not in GFX domain. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 4 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 -- 2 files changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amd/pm: Remove arcturus min power limit

2024-11-19 Thread Lijo Lazar
As per power team, there is no need to impose a lower bound on arcturus power limit. Any unreasonable limit set will result in frequent throttling. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff

[PATCH] drm/amd/pm: Remove arcturus min power limit

2024-11-19 Thread Lijo Lazar
As per power team, there is no need to impose a lower bound on arcturus power limit. Any unreasonable limit set will result in frequent throttling. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff

[PATCH] drm/amdkfd: Use the correct wptr size

2024-11-18 Thread Lijo Lazar
Write pointer could be 32-bit or 64-bit. Use the correct size during initialization. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm

[PATCH] drm/amdgpu: Prefer RAS recovery for scheduler hang

2024-11-17 Thread Lijo Lazar
ed to look for a fatal error. Skip fatal error checking in such cases. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/aldebaran.c| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 15 - drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 55 ++- drivers/gp

[PATCH 1/2] drm/amdgpu: Add init level for post reset reinit

2024-11-15 Thread Lijo Lazar
o identify post reset reinitialization phase. This only provides a device level identification, IP/features may choose to track their state independently also. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/aldebaran.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + d

[PATCH 2/2] drm/amdgpu: Check whether in reset recovery state

2024-11-15 Thread Lijo Lazar
Some in_reset checks are infact checking whether the state is reinitialization after reset. Replace with reset_in_recovery calls to identify that it's really checking for recovery stage after reset. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- driver

[PATCH v2 2/2] drm/amdgpu: Avoid kcq disable during reset

2024-11-04 Thread Lijo Lazar
Reset sequence indicates that hardware already ran into a bad state. Avoid sending unmap queue request to reset KCQ. This will also cover RAS error scenarios which need a reset to recover, hence remove the check. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 10

[PATCH v2 1/2] drm/amdgpu: Fix map/unmap queue logic

2024-11-04 Thread Lijo Lazar
newer code. Signed-off-by: Lijo Lazar Fixes: 6c10b5cc4eaa ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c") --- v2: Add same changes to map queue also (Le Ma) drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 13 - drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 63 +++--

[PATCH 1/2] drm/amdgpu: Fix unmap queue logic

2024-11-04 Thread Lijo Lazar
code. Signed-off-by: Lijo Lazar Fixes: 6c10b5cc4eaa ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c") --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 13 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 47 ++ drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c |

[PATCH 2/2] drm/amdgpu: Avoid kcq disable during reset

2024-11-04 Thread Lijo Lazar
Reset sequence indicates that hardware already ran into a bad state. Avoid sending unmap queue request to reset KCQ. This will also cover RAS error scenarios which need a reset to recover, hence remove the check. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 10

[PATCH] drm/amdgpu: Fix DPX valid mode check on GC 9.4.3

2024-11-03 Thread Lijo Lazar
For DPX mode, the number of memory partitions supported should be less than or equal to 2. Signed-off-by: Lijo Lazar Fixes: 1589c82a1085 ("drm/amdgpu: Check memory ranges for valid xcp mode") --- drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 2 +- 1 file changed, 1 insertion(+),

[PATCH] drm/amdgpu: Skip IP coredump for RAS errors

2024-11-03 Thread Lijo Lazar
For RAS errors, source of error is known. Skip the core dump of IP states. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index

[PATCH] drm/amdgpu: Add compatible NPS mode info

2024-10-30 Thread Lijo Lazar
Populate the compatible NPS modes also for providing partition configuration details through sysfs. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h| 1 + drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 11 +++ 2 files changed, 12 insertions(+) diff --git a

[PATCH v2] drm/amdgpu: Group gfx sysfs functions

2024-10-29 Thread Lijo Lazar
Make amdgpu_gfx_sysfs_init/fini functions as common entry points for all gfx related sysfs nodes. Signed-off-by: Lijo Lazar --- v2: Check cleaner shader capability only for creation of run_cleaner_shader attribute. drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 36 - drivers

[PATCH] drm/amdgpu: Group gfx sysfs functions

2024-10-28 Thread Lijo Lazar
Make amdgpu_gfx_sysfs_init/fini functions as common entry points for all gfx related sysfs nodes. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 37 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 -- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 5

[PATCH v2] drm/amdgpu: Save VCN shared memory with init reset

2024-10-17 Thread Lijo Lazar
function. Signed-off-by: Lijo Lazar Reported-by: Hao Zhou Fixes: 1b665567fd6d ("drm/amdgpu: Add reset on init handler for XGMI") --- v2: Rename save function to a more appropriate amdgpu_vcn_save_vcpu_bo (Leo) drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 6 ++ drivers/gpu/drm/

[PATCH v2] drm/amdgpu: Fix the logic for NPS request failure

2024-10-17 Thread Lijo Lazar
On a hive, NPS request is placed by the first one for all devices in the hive. If the request fails, mark the mode as UNKNOWN so that subsequent devices on unload don't request it. Also, fix the mutex double lock issue in error condition, should have been mutex_unlock. Signed-off-by: Lijo

[PATCH] drm/amdgpu: Fix the logic for NPS request failure

2024-10-17 Thread Lijo Lazar
On a hive, NPS request is placed by the first one for all devices in the hive. If the request fails, mark the mode as UNKNOWN so that subsequent devices on unload don't request it. Also, fix the mutex double lock issue in error condition, should have been mutex_unlock. Signed-off-by: Lijo

[PATCH] drm/amdgpu: Save VCN shared memory with init reset

2024-10-14 Thread Lijo Lazar
function. Signed-off-by: Lijo Lazar Reported-by: Hao Zhou Fixes: 1b665567fd6d ("drm/amdgpu: Add reset on init handler for XGMI") --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 6 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 26 ++- drivers/gpu/drm/amd/amdgpu/am

[PATCH] drm/amdgpu: Zero-initialize mqd backup memory

2024-10-14 Thread Lijo Lazar
Zero-initialize mqd backup memory, otherwise the check for 'already-backed-up' could go wrong. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/d

[PATCH] drm/amdgpu: Use SPX as default in partition config

2024-10-14 Thread Lijo Lazar
In certain cases - ex: when a reset is required on initialization - XCP manager won't have a valid partition mode. In such cases, use SPX as the default selected mode for which partition configuration details are populated. Signed-off-by: Lijo Lazar Reported-by: Hao Zhou Fixes: c7de570

[PATCH v2] drm/amdgpu: Add NPS switch support for GC 9.4.3

2024-10-08 Thread Lijo Lazar
Add dynamic NPS switch support for GC 9.4.3 variants. Only GC v9.4.3 and GC v9.4.4 currently support this. NPS switch is only supported if an SOC supports multiple NPS modes. Signed-off-by: Lijo Lazar Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Feifei Xu --- v2: Add NULL check for

[PATCH] drm/amdgpu: Wait for reset on init completion

2024-10-07 Thread Lijo Lazar
When reset on initialization is requested, wait for the reset to finish. In cases where module is loaded after boot, this makes sure all initialization work is done after a successful return of modprobe. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 9 - 1

[PATCH] drm/amdgpu: Fix logic to determine TOS reload

2024-09-30 Thread Lijo Lazar
Avoid comparing TOS version on APUs. On APUs driver doesn't take care of TOS load. Fixes: 2edc5ecbf1a9 ("drm/amdgpu: Add interface for TOS reload cases") Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-

[PATCH v2 7/7] drm/amdgpu: Add NPS switch support for GC 9.4.3

2024-09-26 Thread Lijo Lazar
Add dynamic NPS switch support for GC 9.4.3 variants. Only GC v9.4.3 and GC v9.4.4 currently support this. NPS switch is only supported if an SOC supports multiple NPS modes. Signed-off-by: Lijo Lazar Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Feifei Xu --- drivers/gpu/drm/amd/amdgpu

[PATCH v2 6/7] drm/amdgpu: Check gmc requirement for reset on init

2024-09-26 Thread Lijo Lazar
Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table. Signed-off-by: Lijo Lazar --- v2: Move NPS request check ahead of TOS

[PATCH v2 4/7] drm/amdgpu: Add sysfs interfaces for NPS mode

2024-09-26 Thread Lijo Lazar
memory partition sysfs logic to be more generic. Signed-off-by: Lijo Lazar Reviewed-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 114 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 ++ 2 files changed, 104 insertions(+), 16 deletions(-) diff -

[PATCH v2 5/7] drm/amdgpu: Place NPS mode request on unload

2024-09-26 Thread Lijo Lazar
If a user has requested NPS mode switch, place the request through PSP during unload of the driver. For devices which are part of a hive, all requests are placed together. If one of them fails, revert back to the current NPS mode. Signed-off-by: Lijo Lazar Signed-off-by: Rajneesh Bhardwaj

[PATCH v2 3/7] drm/amdgpu: Add gmc interface to request NPS mode

2024-09-26 Thread Lijo Lazar
Add a common interface in GMC to request NPS mode through PSP. Also add a variable in hive and gmc control to track the last requested mode. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 16

[PATCH v2 1/7] drm/amdgpu: Add option to refresh NPS data

2024-09-26 Thread Lijo Lazar
In certain use cases, NPS data needs to be refreshed again from discovery table. Add API parameter to refresh NPS data from discovery table. Signed-off-by: Lijo Lazar Reviewed-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 68 +++ drivers/gpu/drm/amd

[PATCH v2 2/7] drm/amdgpu: Add PSP interface for NPS switch

2024-09-26 Thread Lijo Lazar
Implement PSP ring command interface for memory partitioning on the fly on the supported asics. Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Feifei Xu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 25 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 + drivers/gpu/drm/amd

[PATCH v2 0/7] Add support for dynamic NPS switch

2024-09-26 Thread Lijo Lazar
eifei) Lijo Lazar (7): drm/amdgpu: Add option to refresh NPS data drm/amdgpu: Add PSP interface for NPS switch drm/amdgpu: Add gmc interface to request NPS mode drm/amdgpu: Add sysfs interfaces for NPS mode drm/amdgpu: Place NPS mode request on unload drm/amdgpu: Check gmc requiremen

[PATCH v2 0/7] Add support for dynamic NPS switch

2024-09-26 Thread Lijo Lazar
eifei) Lijo Lazar (7): drm/amdgpu: Add option to refresh NPS data drm/amdgpu: Add PSP interface for NPS switch drm/amdgpu: Add gmc interface to request NPS mode drm/amdgpu: Add sysfs interfaces for NPS mode drm/amdgpu: Place NPS mode request on unload drm/amdgpu: Check gmc requiremen

[PATCH v2 2/7] drm/amdgpu: Add PSP interface for NPS switch

2024-09-26 Thread Lijo Lazar
Implement PSP ring command interface for memory partitioning on the fly on the supported asics. Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Feifei Xu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 25 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 + drivers/gpu/drm/amd

[PATCH v2 1/7] drm/amdgpu: Add option to refresh NPS data

2024-09-26 Thread Lijo Lazar
In certain use cases, NPS data needs to be refreshed again from discovery table. Add API parameter to refresh NPS data from discovery table. Signed-off-by: Lijo Lazar Reviewed-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 68 +++ drivers/gpu/drm/amd

[PATCH 2/2] drm/amdgpu: Show current compute partition on VF

2024-09-23 Thread Lijo Lazar
Enable sysfs node for current compute partition mode on VFs also. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 29 +++-- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 12 -- 2 files changed, 31 insertions(+), 10 deletions(-) diff --git a

[PATCH 1/2] drm/amdgpu: Fetch NPS mode for GCv9.4.3 VFs

2024-09-23 Thread Lijo Lazar
Use the memory ranges published in discovery table to deduce NPS mode of GC v9.4.3 VFs. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 30

[PATCH 7/7] drm/amdgpu: Add NPS switch support for GC 9.4.3

2024-09-23 Thread Lijo Lazar
Add dynamic NPS switch support for GC 9.4.3 variants. Only GC v9.4.3 and GC v9.4.4 currently support this. NPS switch is only supported if an SOC supports multiple NPS modes. Signed-off-by: Lijo Lazar Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 + drivers

[PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init

2024-09-23 Thread Lijo Lazar
Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu

[PATCH 5/7] drm/amdgpu: Place NPS mode request on unload

2024-09-23 Thread Lijo Lazar
If a user has requested NPS mode switch, place the request through PSP during unload of the driver. For devices which are part of a hive, all requests are placed together. If one of them fails, revert back to the current NPS mode. Signed-off-by: Lijo Lazar Signed-off-by: Rajneesh Bhardwaj

[PATCH 4/7] drm/amdgpu: Add sysfs interfaces for NPS mode

2024-09-23 Thread Lijo Lazar
memory partition sysfs logic to be more generic. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 114 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 ++ 2 files changed, 104 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/

[PATCH 3/7] drm/amdgpu: Add gmc interface to request NPS mode

2024-09-23 Thread Lijo Lazar
Add a common interface in GMC to request NPS mode through PSP. Also add a variable in hive and gmc control to track the last requested mode. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 16 drivers/gpu/drm/amd/amdgpu

[PATCH 2/7] drm/amdgpu: Add PSP interface for NPS switch

2024-09-23 Thread Lijo Lazar
Implement PSP ring command interface for memory partitioning on the fly on the supported asics. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 25 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 + drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 1

[PATCH 1/7] drm/amdgpu: Add option to refresh NPS data

2024-09-23 Thread Lijo Lazar
In certain use cases, NPS data needs to be refreshed again from discovery table. Add API parameter to refresh NPS data from discovery table. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 68 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h | 2

[PATCH 0/7] Add support for dynamic NPS switch

2024-09-23 Thread Lijo Lazar
ch is pending and initiates a mode-1 reset. 7) During resume after a reset, NPS ranges are read again from discovery table. 8) Driver detects the new NPS mode and makes a compatible compute partition mode switch if required. Lijo Lazar (7): drm/amdgpu: Add option to refresh NPS data drm/amdgpu: Ad

[PATCH] drm/amdgpu: Fix XCP instance mask calculation

2024-09-12 Thread Lijo Lazar
Fix instance mask calculation for VCN IP. There are cases where VCN instance could be shared across partitions. Fix here so that other blocks don't need to check for any shared instances based on partition mode. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c

[PATCH v2 10/10] drm/amdgpu: Add PSP reload case to reset-on-init

2024-09-11 Thread Lijo Lazar
A reset on initialization will be needed if a new PSP TOS needs to be loaded than the one currently active on the system. This is possible only on SOCs which support a full device reset which results in unload of active PSP TOS. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Alex

[PATCH v2 07/10] drm/amdgpu: Drop delayed reset work handler

2024-09-11 Thread Lijo Lazar
Drop delayed reset work handler as it is no longer used. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 -- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 80 - 2 files changed, 84 deletions

[PATCH v2 05/10] drm/amdgpu: Add helper to initialize badpage info

2024-09-11 Thread Lijo Lazar
Add a separate function to read badpage data during initialization. Reading bad pages will need hardware access and cannot be done during reset. Hence in cases where device needs a full reset during init itself, attempting to read will cause a deadlock. Signed-off-by: Lijo Lazar Reviewed-by

[PATCH v2 09/10] drm/amdgpu: Add interface for TOS reload cases

2024-09-11 Thread Lijo Lazar
Add interface to check if a different TOS needs to be loaded than the one which is which is already active on the SOC. Presently the interface is restricted to specific variants of PSPv13.0. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd

[PATCH v2 08/10] drm/amdgpu: Support reset-on-init on select SOCs

2024-09-11 Thread Lijo Lazar
Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Alex Deucher --- v2: Use renamed variable drivers/gpu/drm/amd/amdgpu/aldebaran.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd

[PATCH v2 06/10] drm/amdgpu: Refactor XGMI reset on init handling

2024-09-11 Thread Lijo Lazar
Use XGMI hive information to rely on resetting XGMI devices on initialization rather than using mgpu structure. mgpu structure may have other devices as well. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu --- v2: Use consistent naming scheme for functions/variables (Alex Deucher

[PATCH v2 04/10] drm/amdgpu: Add reset on init handler for XGMI

2024-09-11 Thread Lijo Lazar
In some cases, device needs to be reset before first use. Add handlers for doing device reset during driver init sequence. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu --- v2: Use consistent naming scheme for functions/variables (Alex Deucher) drivers/gpu/drm/amd/amdgpu/amdgpu.h

[PATCH v2 03/10] drm/amdgpu: Separate reinitialization after reset

2024-09-11 Thread Lijo Lazar
Move the reinitialization part after a reset to another function. No functional changes. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Acked-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 150 - 2

[PATCH v2 02/10] drm/amdgpu: Use init level for pending_reset flag

2024-09-11 Thread Lijo Lazar
Drop pending_reset flag in gmc block. Instead use init level to determine which type of init is preferred - in this case MINIMAL. Signed-off-by: Lijo Lazar --- v2: Fix logical issue while replacing pending_reset flag in smuv11 (Feifei) Use renamed init level id

[PATCH v2 01/10] drm/amdgpu: Add init levels

2024-09-11 Thread Lijo Lazar
Add init levels to define the level to which device needs to be initialized. Signed-off-by: Lijo Lazar --- v2: Add comments describing init levels Drop unnecessary assignment Rename AMDGPU_INIT_LEVEL_MINIMAL to AMDGPU_INIT_LEVEL_MINIMAL_XGMI drivers/gpu/drm/amd/amdgpu

[PATCH v2 00/10] Support XGMI reset on init

2024-09-11 Thread Lijo Lazar
scenario where device is going to be reset. The series adds an API interface to check if a PSP TOS reload is required. v2: Fix logical issue while replacing pending_reset flag with init level Use consistent naming for functions/variables Lijo Lazar (10): drm/amdgpu: Add init

[PATCH] drm/amdgpu: Fix JPEG v4.0.3 register write

2024-09-06 Thread Lijo Lazar
EXTERNAL_REG_INTERNAL_OFFSET/EXTERNAL_REG_WRITE_ADDR should be used in pairs. If an external register shoudln't be written, both packets shouldn't be sent. Fixes: a78b48146972 ("drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV") Signed-off-by: Lijo

[PATCH 10/10] drm/amdgpu: Add PSP reload case to reset-on-init

2024-09-02 Thread Lijo Lazar
A reset on initialization will be needed if a new PSP TOS needs to be loaded than the one currently active on the system. This is possible only on SOCs which support a full device reset which results in unload of active PSP TOS. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/soc15.c

[PATCH 08/10] drm/amdgpu: Support reset-on-init on select SOCs

2024-09-02 Thread Lijo Lazar
Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/aldebaran.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c b/drivers/gpu/drm/amd/amdgpu/aldebaran.c index b0f95a7649bf

[PATCH 09/10] drm/amdgpu: Add interface for TOS reload cases

2024-09-02 Thread Lijo Lazar
Add interface to check if a different TOS needs to be loaded than the one which is which is already active on the SOC. Presently the interface is restricted to specific variants of PSPv13.0. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 13 + drivers/gpu

[PATCH 07/10] drm/amdgpu: Drop delayed reset work handler

2024-09-02 Thread Lijo Lazar
Drop delayed reset work handler as it is no longer used. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 -- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 80 - 2 files changed, 84 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b

[PATCH 04/10] drm/amdgpu: Add reset on init handler for XGMI

2024-09-02 Thread Lijo Lazar
In some cases, device needs to be reset before first use. Add handlers for doing device reset during driver init sequence. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 148 ++ drivers/gpu/drm/amd

[PATCH 06/10] drm/amdgpu: Refactor XGMI reset on init handling

2024-09-02 Thread Lijo Lazar
Use XGMI hive information to rely on resetting XGMI devices on initialization rather than using mgpu structure. mgpu structure may have other devices as well. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 6

[PATCH 03/10] drm/amdgpu: Separate reinitialization after reset

2024-09-02 Thread Lijo Lazar
Move the reinitialization part after a reset to another function. No functional changes. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 150 - 2 files changed, 89 insertions(+), 63 deletions

[PATCH 05/10] drm/amdgpu: Add helper to initialize badpage info

2024-09-02 Thread Lijo Lazar
Add a separate function to read badpage data during initialization. Reading bad pages will need hardware access and cannot be done during reset. Hence in cases where device needs a full reset during init itself, attempting to read will cause a deadlock. Signed-off-by: Lijo Lazar --- drivers/gpu

[PATCH 02/10] drm/amdgpu: Use init level for pending_reset flag

2024-09-02 Thread Lijo Lazar
Drop pending_reset flag in gmc block. Instead use init level to determine which type of init is preferred - in this case MINIMAL. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 33 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 - drivers

[PATCH 01/10] drm/amdgpu: Add init levels

2024-09-02 Thread Lijo Lazar
Add init levels to define the level to which device needs to be initialized. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 14 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 54 ++ 2 files changed, 68 insertions(+) diff --git a/drivers/gpu

[PATCH 00/10] Support XGMI reset on init

2024-09-02 Thread Lijo Lazar
scenario where device is going to be reset. The series adds an API interface to check if a PSP TOS reload is required. Lijo Lazar (10): drm/amdgpu: Add init levels drm/amdgpu: Use init level for pending_reset flag drm/amdgpu: Separate reinitialization after reset drm/amdgpu: Add reset

[PATCH v2] drm/amdgpu: Normalize reg offsets on JPEG v4.0.3

2024-08-27 Thread Lijo Lazar
On VFs and SOCs with GC 9.4.4, VCN RRMT is disabled. Only local register offsets should be used on JPEG v4.0.3 as they cannot handle remote access to other AIDs. Since only local offsets are used, the special write to MCM_ADDR register is no longer needed. Signed-off-by: Lijo Lazar --- v2

[PATCH] drm/amd/pm: Add support for new P2S table revision

2024-08-21 Thread Lijo Lazar
Add p2s table support for a new revision of SMUv13.0.6. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Reviewed-by: Asad Kamal --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu

[PATCH] drm/amdgpu: Normalize reg offsets on JPEG v4.0.3

2024-08-20 Thread Lijo Lazar
Only local register offsets should be used on JPEG v4.0.3 as they cannot handle remote access to other AIDs. Since only local offsets are used, the special write to MCM_ADDR register is no longer needed. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 32

[PATCH] drm/amdgpu: Reorder to read EFI exported ROM first

2024-08-11 Thread Lijo Lazar
On EFI BIOSes, PCI ROM may be exported through EFI_PCI_IO_PROTOCOL and expansion ROM BARs may not be enabled. Choose to read from EFI exported ROM data before reading PCI Expansion ROM BAR. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 10 +- 1 file changed, 5

[PATCH] drm/amdkfd: Add node_id to location_id generically

2024-08-08 Thread Lijo Lazar
If there are multiple nodes per kfd device, add nodeid to location_id to differentiate. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd

[PATCH] drm/amd/pm: Ignore throttle events on SMUv13.0.6

2024-07-24 Thread Lijo Lazar
Spurious events are seen, temporarily ignore the events altogether. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13

[PATCH] drm/amdgpu: Initialize VF partition mode

2024-07-08 Thread Lijo Lazar
For SOCs with GFX v9.4.3, a VF may have multiple compute partitions. Fetch the partition information during init and initialize partition nodes. There is no support to switch partition mode in VF mode, hence disable the same. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h

[PATCH] drm/amdkfd: Use device based logging for errors

2024-06-25 Thread Lijo Lazar
Convert some pr_* to some dev_* APIs to identify the device. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 21 --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 8 ++- .../gpu/drm/amd/amdkfd

[PATCH] drm/amdgpu: Fix pci state save during mode-1 reset

2024-06-18 Thread Lijo Lazar
Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Signed-off-by: Lijo Lazar Fixes: 5c03e5843e6b ("drm/amdgpu:add smu mode1/2 support for aldebaran") --- drivers/gpu/drm/amd/amdgpu/amdgpu_de

[PATCH] drm/amdgpu: Don't show false warning for reg list

2024-06-02 Thread Lijo Lazar
If reg list is already loaded on PSP 13.0.2 SOCs, psp will give TEE_ERR_CANCEL response on second time load. Avoid printing warn message for it. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 25 + drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 5

[PATCH] drm/amdgpu: Skip coredump during resets for debug

2024-05-31 Thread Lijo Lazar
Skip scheduling coredump when gpu reset is intentionally triggered through debugfs. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

[PATCH] drm/amdgpu: Add CRC16 selection in config

2024-05-21 Thread Lijo Lazar
KFD uses crc16 for gpu_id generation. Fixes: 6dbc6469ab0b ("drm/amdkfd: Ensure gpu_id is unique") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202405211405.tidtwibx-...@intel.com/ Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/Kconfig | 1

[PATCH v5 07/10] drm/amd/pm: Add xgmi plpd to arcturus pm_policy

2024-05-16 Thread Lijo Lazar
On arcturus, allow changing xgmi plpd policy through 'pm_policy/xgmi_plpd' sysfs interface. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Reviewed-by: Asad Kamal --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 7 ++-- .../gpu/drm/amd/pm/swsmu/smu11/arcturus_

[PATCH v5 09/10] drm/amd/pm: Remove unused interface to set plpd

2024-05-16 Thread Lijo Lazar
Remove unused callback to set PLPD policy and its implementation from arcturus, aldebaran and SMUv13.0.6 SOCs. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Reviewed-by: Asad Kamal --- drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 6 --- .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c

[PATCH v5 08/10] drm/amd/pm: Remove legacy interface for xgmi plpd

2024-05-16 Thread Lijo Lazar
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client. Remove that as well. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Reviewed-by: Asad Kamal --- v2: No change v3: Rebase to remove

[PATCH v5 10/10] Documentation/amdgpu: Add PM policy documentation

2024-05-16 Thread Lijo Lazar
Add documentation about the newly added pm_policy node in sysfs. Signed-off-by: Lijo Lazar --- v5: Update documentation to reflect pm_policy nodes and sub nodes for each policy type Documentation/gpu/amdgpu/thermal.rst | 6 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 53

[PATCH v5 06/10] drm/amd/pm: Add xgmi plpd to aldebaran pm_policy

2024-05-16 Thread Lijo Lazar
On aldebaran, allow changing xgmi plpd policy through 'pm_policy/xgmi_plpd' sysfs interface. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Reviewed-by: Asad Kamal --- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 36 +++ 1 file changed, 36 insertions(+) di

[PATCH v5 03/10] drm/amd/pm: Add support to select pstate policy

2024-05-16 Thread Lijo Lazar
Add support to select pstate policy in SOCs with SMUv13.0.6 Signed-off-by: Lijo Lazar eviewed-by: Hawking Zhang Reviewed-by: Asad Kamal --- v2,v3: No change v4: Use macro for policy type name .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 2 + .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c

  1   2   3   4   >