RE: [PATCH 2/2] drm/amd/amdgpu: force flush resubmit job

2021-02-24 Thread Chen, JingWen
[AMD Official Use Only - Internal Distribution Only] Consider this sequence: 1. GPU reset begin 2. device reset count + 1 3. job id 1 scheduled with vm_need_flush=false 4. When handling this job in vm_flush, amdgpu_vmid_had_gpu_reset will return true, thus this job will be flush and the vmid_rese

Re: [PATCH v2] drm/scheduler: Fix hang when sched_entity released

2021-02-24 Thread Christian König
Am 24.02.21 um 16:13 schrieb Andrey Grodzovsky: Ping Sorry, I've been on vacation this week. Andrey On 2021-02-20 7:12 a.m., Andrey Grodzovsky wrote: On 2/20/21 3:38 AM, Christian König wrote: Am 18.02.21 um 17:41 schrieb Andrey Grodzovsky: On 2/18/21 10:15 AM, Christian König wrote:

Re: [PATCH 2/2] drm/amd/amdgpu: force flush resubmit job

2021-02-24 Thread Christian König
Am 25.02.21 um 06:27 schrieb Jingwen Chen: [Why] when a job is scheduled during TDR(after device reset count increase and before drm_sched_stop), this job won't do vm_flush when resubmit itself after GPU reset done. This can lead to a page fault. [How] Always do vm_flush for resubmit job. N

RE: [PATCH 1/2] drm: add a flag to indicate job is resubmitted

2021-02-24 Thread Liu, Monk
[AMD Official Use Only - Internal Distribution Only] Patches are fine by me, reviewed by: Monk Liu Still need Christian to take a look Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message- From:

RE: [PATCH] drm/amdgpu: decline max_me for mec2_fw remove in renoir/arcturus

2021-02-24 Thread Zhu, Changfeng
[AMD Official Use Only - Internal Distribution Only] Thanks,Ray. BR, Changfeng. -Original Message- From: Huang, Ray Sent: Thursday, February 25, 2021 1:42 PM To: Zhu, Changfeng Cc: amd-gfx@lists.freedesktop.org; Clements, John Subject: Re: [PATCH] drm/amdgpu: decline max_me for mec2_

Re: [PATCH] drm/amdgpu: decline max_me for mec2_fw remove in renoir/arcturus

2021-02-24 Thread Huang Rui
On Wed, Feb 24, 2021 at 05:10:55PM +0800, Zhu, Changfeng wrote: > From: changzhu > > From: Changfeng > > The value of max_me in amdgpu_gfx_rlc_setup_cp_table should reduce to 4 > when mec2_fw is removed on asic renoir/arcturus. Or it will cause kernel > NULL pointer when modprobe driver. > > C

[PATCH 2/2] drm/amd/amdgpu: force flush resubmit job

2021-02-24 Thread Jingwen Chen
[Why] when a job is scheduled during TDR(after device reset count increase and before drm_sched_stop), this job won't do vm_flush when resubmit itself after GPU reset done. This can lead to a page fault. [How] Always do vm_flush for resubmit job. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/

[PATCH 1/2] drm: add a flag to indicate job is resubmitted

2021-02-24 Thread Jingwen Chen
Add a flag in drm_sched_job to indicate the job resubmit. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/scheduler/sched_main.c | 2 ++ include/drm/gpu_scheduler.h| 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/schedul

[pull] amdgpu drm-fixes-5.12

2021-02-24 Thread Alex Deucher
Hi Dave, Daniel, Fixes for 5.12. The following changes since commit f730f39eb981af249d57336b47cfe3925632a7fd: Merge tag 'drm-intel-next-fixes-2021-02-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-next (2021-02-19 13:55:07 +1000) are available in the Git repository at: https

Re: [PATCH 147/159] drm/amdgpu: restore aldebaran save ttmp and trap config on init (v2)

2021-02-24 Thread Felix Kuehling
This patch is for the debugger functionality that's not on amd-staging-drm-next yet. You can probably drop this patch for now. Regards,   Felix On 2021-02-24 5:18 p.m., Alex Deucher wrote: From: Jonathan Kim Initialization of TRAP_DATA0/1 is still required for the debugger to detect new wave

Re: [PATCH] drm/amdgpu: add ih call to process until checkpoint

2021-02-24 Thread Felix Kuehling
On 2021-02-24 10:54 a.m., Kim, Jonathan wrote: [AMD Official Use Only - Internal Distribution Only] -Original Message- From: Koenig, Christian Sent: Wednesday, February 24, 2021 4:17 AM To: Kim, Jonathan ; amd- g...@lists.freedesktop.org Cc: Yang, Philip ; Kuehling, Felix Subject: Re:

RE: [PATCH] drm/amdgpu/swsmu/vangogh: Only use RLCPowerNotify msg for disable

2021-02-24 Thread Quan, Evan
[AMD Public Use] Acked-by: Evan Quan -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, February 25, 2021 5:16 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH] drm/amdgpu/swsmu/vangogh: Only use RLCPowerNotify msg for disable Per di

[PATCH] drm/amdgpu: Remove amdgpu_device arg from free_sgt api

2021-02-24 Thread Ramesh Errabolu
Currently callers have to provide handle of amdgpu_device, which is not used by the implementation. It is unlikely this parameter will become useful in future, thus removing it Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amd

RE: [PATCH] drm/amdgpu/pm: make unsupported power profile messages debug

2021-02-24 Thread Quan, Evan
[AMD Public Use] Reviewed-by: Evan Quan -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, February 25, 2021 1:29 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH] drm/amdgpu/pm: make unsupported power profile messages debug Making th

[PATCH 159/159] drm/amd/pm: add new data in metrics table

2021-02-24 Thread Alex Deucher
From: Kenneth Feng Export new data in the metrics table for gfx and memory utilization counter, and each hbm temperature as well. v2: change the metrics table version to v1.1 v3: fix the coding style Signed-off-by: Kenneth Feng Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- .../gp

[PATCH 158/159] drm/amdgpu: add psp RAP L0 check support

2021-02-24 Thread Alex Deucher
From: Kevin Wang add PSP RAP L0 check when RAP TA is loaded. Signed-off-by: Kevin Wang Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/a

[PATCH 148/159] drm/amdgpu: apply gc v9_4_2 golden settings for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang Those registers should be programmed as one-time initialization Signed-off-by: Hawking Zhang Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 51 + dr

[PATCH 157/159] drm/amdgpu: change psp_rap_invoke() function return value

2021-02-24 Thread Alex Deucher
From: Kevin Wang RAP TA is an optional firmware. if it doesn’t exist, the driver should bypass psp_rap_invoke() function. 1. bypass psp_rap_invoke() when RAP TA is not loaded. 2. add new parameter (status) to query RAP TA status. (the status value is different with psp_ta_invoke(), 3. fix the

[PATCH 140/159] drm/amdgpu: use pd addr based on gart level page table

2021-02-24 Thread Alex Deucher
From: Alex Sierra With a recent gart page table re-construction, the gart page table is now 2-level for some ASICs: PDB0->PTB. In the case of 2-level gart page table, the page_table_base of vmid0 should point to PDB0 instead of PTB. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Review

[PATCH 155/159] drm/amdgpu: Let KFD use more VMIDs on Aldebaran

2021-02-24 Thread Alex Deucher
From: Felix Kuehling When there is no graphics support, KFD can use more of the VMIDs. Graphics VMIDs are only used for video decoding/encoding and post processing. With two VCE engines, there is no reason to reserve more than 2 VMIDs for that. Signed-off-by: Felix Kuehling Reviewed-by: Alex De

[PATCH 147/159] drm/amdgpu: restore aldebaran save ttmp and trap config on init (v2)

2021-02-24 Thread Alex Deucher
From: Jonathan Kim Initialization of TRAP_DATA0/1 is still required for the debugger to detect new waves on Aldebaran. Also, per-vmid global trap enablement may be required outside of debugger scope so move to init phase. v2: just add the gfx 9.4.2 changes (Alex) Signed-off-by: Jonathan Kim R

[PATCH 153/159] drm/amdgpu: refine ras codes for GC utc of aldebaran

2021-02-24 Thread Alex Deucher
From: Dennis Li The bank number of both VML2 and ATCL2 are changed to 8, so refine related codes to avoid defining long name arrays. Signed-off-by: Dennis Li Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 269 +--- dri

[PATCH 152/159] drm/amdgpu: add ras support for gfx of aldebaran

2021-02-24 Thread Alex Deucher
From: Dennis Li add edc counter/status reset and query functions for gfx block of aldebaran. v2: change to clear edc counter explicitly aldebaran hardware will not clear edc counter after driver reading them, so driver should clear them explicitly. Signed-off-by: Dennis Li Reviewed-by: Hawking

[PATCH 154/159] drm/amdgpu: enable watchdog feature for SQ of aldebaran

2021-02-24 Thread Alex Deucher
From: Dennis Li SQ's watchdog timer monitors forward progress, a mask of which waves caused the watchdog timeout is recorded into ras status registers and then trigger a system fatal error event. v2: 1. change *query_timeout_status to *query_sq_timeout_status. 2. move query_sq_timeout_status int

[PATCH 151/159] drm/amdgpu: add gc powerbrake support (v2)

2021-02-24 Thread Alex Deucher
From: Kevin Wang add GC power brake feature support for Aldebaran. v2: squash in fixes (Alex) Signed-off-by: Kevin Wang Reviewed-by: Kenneth Feng Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +++ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 26 ++

[PATCH 150/159] drm/amdgpu: update TCP_CHAN_STEER_1 golden value for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang The golden setting was changed recently. update to the latest one Signed-off-by: Hawking Zhang Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/d

[PATCH 149/159] drm/amdgpu: add common gc golden settings for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang golden settings that should be applied Signed-off-by: Hawking Zhang Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_

[PATCH 142/159] drm/amd/pm: Enable user min/max gfxclk on aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Aldebaran has fine grained DPM for GFXCLK. Instead of a discrete level, user can specify a min/max range of GFXCLK for any profiling/tuning purpose.This option is available only in manual performance level mode. Select "manual" as power_dpm_force_performance_level and specify the

[PATCH 156/159] drm/amd/pm: add aldebaran serial number support

2021-02-24 Thread Alex Deucher
From: Kevin Wang add aldebaran serial number support. (serial number from metrics table) Signed-off-by: Kevin Wang Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 23 +++ 1 file changed, 23 insertions(+) diff --git a

[PATCH 146/159] drm/amdkfd: add aldebaran kfd2kgd callbacks to kfd device (v2)

2021-02-24 Thread Alex Deucher
From: Jonathan Kim Create dedicated Aldebaran kfd2kgd callbacks to prepare for new per-vmid register instructions for debug trap setting functions and sending host traps. v2: rebase (Alex) Signed-off-by: Jonathan Kim Reviewed-by: Oak Zeng Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/

[PATCH 145/159] drm/amdkfd: Check HIQ's MQD for queue preemption status

2021-02-24 Thread Alex Deucher
From: Oak Zeng MEC firmware can silently fail the queue preemption request without time out. In this case, HIQ's MQD's queue_doorbell_id will be set. Check this field to see whether last queue preemption was successful or not. Signed-off-by: Oak Zeng Suggested-by: Jay Cornwall Acked-by: Felix

[PATCH 137/159] drm/amdgpu: update mmhub client ids for Aldebaran

2021-02-24 Thread Alex Deucher
From: Alex Sierra update mmhub client id table for Aldebaran. Signed-off-by: Alex Sierra Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 29 +++ 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.

[PATCH 144/159] drm/amdkfd: Add kernel parameter to stop queue eviction on vm fault

2021-02-24 Thread Alex Deucher
From: Oak Zeng This is to keep wavefront context for debug purpose Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +++ drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c | 5 +++-- drivers/gpu/drm/amd

[PATCH 143/159] drm/amdgpu: allow use psp to load firmware (v2)

2021-02-24 Thread Alex Deucher
From: Hawking Zhang Match existing asics. v2: rebase (Alex) Signed-off-by: Hawking Zhang Reviewed-by: Kevin Wang Reviewed-by: Le Ma Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/dr

[PATCH 139/159] drm/amdgpu: Fix the comment in amdgpu_gmc.h

2021-02-24 Thread Alex Deucher
From: Oak Zeng More accurate words are used to address a code review feedback Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/

[PATCH 141/159] drm/amd/pm: remove aldebaran serial number support

2021-02-24 Thread Alex Deucher
From: Kevin Wang the following message is not supported. PPSMC_MSG_ReadSerialNumTop32 PPSMC_MSG_ReadSerialNumBottom32 Signed-off-by: Kevin Wang Reviewed-by: Kenneth Feng Signed-off-by: Alex Deucher --- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 19 --- 1 file changed, 1

[PATCH 135/159] drm/amdgpu: workaround the TMR MC address issue

2021-02-24 Thread Alex Deucher
From: Oak Zeng With the 2-level gart page table, vram is squeezed into gart aperture and FB aperture is disabled. Therefore all VRAM virtual addresses are in the GART aperture. However currently PSP requires TMR addresses in FB aperture. So we need some design change at PSP FW level to support

[PATCH 138/159] amdgpu: Fix GART page table s-bit

2021-02-24 Thread Alex Deucher
From: Oak Zeng For the new 2-level GART table, the last PDE0 points to PTB. Since PTB is in vram and right now we are runing under s=0 mode (vram is treated as FB carveout), so the s bit of this PDE0 should be set to 0. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Alex De

[PATCH 134/159] drm/amdgpu: HW setup of 2-level vmid0 page table

2021-02-24 Thread Alex Deucher
From: Oak Zeng Set up HW for 2-level vmid0 page table: 1. Set up PAGE_TABLE_START/END registers. Currently only plan to do 2-level page table for ALDEBARAN, so only gfxhub1.0 and mmhub1.7 is changed. 2. Set page table base register. For 2-level page table, the page table base should point to PDB0

[PATCH 136/159] drm/amdgpu: enable sram initialization for aldebaran

2021-02-24 Thread Alex Deucher
From: Dennis Li Aldebaran can share the same initializing shader code witn arcturus. Signed-off-by: Dennis Li Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm

[PATCH 133/159] drm/amdgpu: Set up vmid0 PDB0

2021-02-24 Thread Alex Deucher
From: Oak Zeng If use gart for FB translation, allocate and fill PDB0. Signed-off-by: Oak Zeng Reviewed-by: Christian Konig Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 28 +++ 1 file changed, 24 insertions(+), 4

[PATCH 132/159] drm/amdgpu: Add function to allocate and fill PDB0

2021-02-24 Thread Alex Deucher
From: Oak Zeng Add functions to allocate PDB0, map it for CPU access, and fill it. Those functions are only used for 2-level vmid0 page table construction Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 103 ++

[PATCH 131/159] drm/amdgpu: Use different gart table parameters for 2-level gart table

2021-02-24 Thread Alex Deucher
From: Oak Zeng If use gart for FB translation, we will squeeze vram into sysvm aperture. This requires 2 level gart table. Add page table depth and page table block size parameters to gmc. This is prepare work to 2-level gart table construction Signed-off-by: Oak Zeng Reviewed-by: Christian Kon

[PATCH 130/159] drm/amdgpu: Placement of gart and vram in sysvm aperture

2021-02-24 Thread Alex Deucher
From: Oak Zeng If use GART for FB translation, place both vram and gart to sysvm aperture. AGP aperture is not set up in this case because it is not used Signed-off-by: Oak Zeng Reviewed-by: Christian Konig Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdg

[PATCH 129/159] drm/amdgpu: Modify comments of vram_start/end

2021-02-24 Thread Alex Deucher
From: Oak Zeng Modify the comment to reflect the fact that, if use GART for vram address translation for vmid0, [vram_start, vram_end] will be placed inside SYSVM aperture, together with GART. Signed-off-by: Oak Zeng Reviewed-by: Christian Konig Reviewed-by: Felix Kuehling Signed-off-by: Alex

[PATCH 128/159] drm/amdgpu: Moved gart_size calculation to mc_init functions

2021-02-24 Thread Alex Deucher
From: Oak Zeng In amdgpu_gmc_gart_location function, gart_size is adjusted by a smu_prv_buffer_size. This logic shouldn't belong to this function. Move the logic to the mc_init functions Signed-off-by: Oak Zeng Reviewed-by: Christian Konig Reviewed-by: Felix Kuehling Signed-off-by: Alex Deuch

[PATCH 109/159] drm/amdgpu: add mmhub client ids for aldebaran

2021-02-24 Thread Alex Deucher
Add the mmhub client id table for aldebaran. Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 41 +++ 1 file changed, 41 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_

[PATCH 121/159] drm/amdgpu: mask the xgmi number of hops reported from psp to kfd

2021-02-24 Thread Alex Deucher
From: Jonathan Kim The psp supplies the link type in the upper 2 bits of the psp xgmi node information num_hops field. With a new link type, Aldebaran has these bits set to a non-zero value (1 = xGMI3) so the KFD topology will report the incorrect IO link weights without proper masking. The actu

[PATCH 123/159] drm/amd/pm: Add DCBTC support for aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar On aldebaran DCBTC should be run after enabling DPM. DCBTC won't be run if support is not enabled in PPTable. Without PPTable support the message is dummy and will return success always. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- ...

[PATCH 117/159] drm/amdgpu: Don't change CPU mapping of on-chip memory pools

2021-02-24 Thread Alex Deucher
From: Harish Kasiviswanathan This change does a partial revert of this commit 'drm/amdgpu: set CPU mapping of vram as cached for A+A mode (v2)' The on-chip memory pools are not accessed by CPU so the previous change is not necessary Acked-by: Joseph Greathouse Signed-off-by: Harish Kasiviswan

[PATCH 127/159] drm/amdgpu: Use physical translation mode to access page table

2021-02-24 Thread Alex Deucher
From: Oak Zeng On A+A platform, CPU write page directory and page table in cached mode. So it is necessary for page table walker to snoop CPU cache. This setting is necessary for page walker to snoop page directory and page table data out of CPU cache. Signed-off-by: Oak Zeng Acked-by: Christia

[PATCH 125/159] drm/amd/pm: Correct msg status check for powerlimit

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Status 0 indicates success, fix the check before using PPTable limit Signed-off-by: Lijo Lazar Reviewed-by: Evan Quan Reviewed-by: Kevin Wang ` Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +- 1 file changed, 1 insertion(+), 1 deleti

[PATCH 119/159] drm/amdgpu: enable retry fault wptr overflow

2021-02-24 Thread Alex Deucher
From: Philip Yang If xnack is on, VM retry fault interrupt send to IH ring1, and ring1 will be full quickly. IH cannot receive other interrupts, this causes deadlock if migrating buffer using sdma and waiting for sdma done while handling retry fault. Remove VMC from IH storm client, enable ring1

[PATCH 118/159] drm/amdgpu: Use free system memory size for kfd memory accounting

2021-02-24 Thread Alex Deucher
From: Oak Zeng With the current kfd memory accounting scheme, kfd applications can use up to 15/16 of total system memory. For system which has small total system memory size it leaves small system memory for OS. For example, if the system has totally 16GB of system memory, this scheme leave OS a

[PATCH 124/159] drm/amd/pm: Enable performance determinism on aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Performance Determinism is a new mode in Aldebaran where PMFW tries to maintain sustained performance level. It can be enabled on a per-die basis on aldebaran. To guarantee that it remains within the power cap, a max GFX frequency needs to be specified in this mode. A new power_d

[PATCH 120/159] drm/amdgpu: enable 48-bit IH timestamp counter

2021-02-24 Thread Alex Deucher
From: Alex Sierra By default this timestamp is 32 bit counter. It gets overflowed in around 10 minutes. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/

[PATCH 122/159] drm/amd/pm: Fix power limit query on aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Aldebaran doesn't have AC/DC power limits. Separate the implementation from SMU13. Max power limit is queried from PPTable. Signed-off-by: Lijo Lazar Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 28 +-

[PATCH 126/159] drm/amdgpu: Don't reserve vram as WC for A+A

2021-02-24 Thread Alex Deucher
From: Oak Zeng On A+A platform, vram can be mapped as WB. Not necessarily to always map vram as WC on such platform. Calling function arch_io_reserve_memtype_wc will mark the whole vram region as WC. So don't call it for A+A platform. Signed-off-by: Oak Zeng Suggested-by: Alex Deucher Acked-b

[PATCH 113/159] drm/amdgpu/pm: Remove redundant generic message index

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Remove SMU_MSG_GfxDriverReset generic index. Always use SMU_MSG_GfxDeviceDriverReset as the generic index for reset. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/inc/smu_types.h | 1 - 1 file

[PATCH 115/159] drm/amdgpu: Fix aldebaran MMHUB CG/LS logic

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Aldebaran MMHUB CG/LS logic is controlled by VBIOS. Enable the state change logic only if driver is used for control. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 19 ++

[PATCH 116/159] drm/amdgpu: apply new pmfw loading sequence to arcturus and onwards

2021-02-24 Thread Alex Deucher
From: Hawking Zhang Arcturus and onwards products should follow the same sequence that have pmfw loading ahead of tmr setup Signed-off-by: Hawking Zhang Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 ++--- 1 file changed, 2 insertions(+),

[PATCH 110/159] drm/amdgpu: Add clock gating support for aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Aldebaran clock gating support for GFX,SDMA,IH blocks VCN/JPEG blocks are excluded in this patch, to be enabled later Signed-off-by: Lijo Lazar Acked-by: Feifei Xu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/sdm

[PATCH 103/159] drm/amdkfd: Fix saving the ACC vgprs for Aldebaran

2021-02-24 Thread Alex Deucher
From: Laurent Morichetti get_num_acc_vgprs does not set status.scc if the number of acc vgprs is 0, so use an and instruction to set the condition code. The Aldebaran handler binary was not based on the latest version of the sources, so this update to the binary is the minimal change only adding

[PATCH 111/159] drm/amdgpu/pm: Remove unsupported MP1 messages from aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar PrepareMp1Reset and SoftReset messages are not supported on aldebaran. Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 3 --- drivers/gpu/drm/amd/pm/swsmu/smu13/ald

[PATCH 105/159] drm/amdgpu: Enable swsmu block on aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Enable smu13 block on aldebaran Signed-off-by: Lijo Lazar Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/soc15.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 8c93cf411f68..441dee

[PATCH 107/159] drm/amdgpu: enable vcn dpg mode on aldebaran

2021-02-24 Thread Alex Deucher
From: James Zhu Enable vcn dpg mode on aldebaran Signed-off-by: James Zhu Reviewed-by: Leo Liu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/soc15.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdg

[PATCH 114/159] drm/amdgpu: Enable CP idle interrupts

2021-02-24 Thread Alex Deucher
From: Lijo Lazar v1: The interrupts need to be enabled to move to DS clocks. v2: Don't enable GFX IDLE interrupts if there are no GFX rings. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 8 +++- 1 file changed

[PATCH 112/159] drm/amdgpu/pm: Fix reset message mapping on aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Use the correct mapping for mode-reset messages on aldebaran Signed-off-by: Lijo Lazar Reviewed-by: Feifei Xu Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) di

[PATCH 108/159] drm/amdgpu: enable dpg indirect sram mode on aldebaran

2021-02-24 Thread Alex Deucher
From: James Zhu Enable dpg indirect sram mode on aldebaran. Signed-off-by: James Zhu Reviewed-by: Leo Liu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/

[PATCH 104/159] drm/amdgpu: switch to cached noretry setting for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang global noretry setting now is cached to gmc.noretry Signed-off-by: Hawking Zhang Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/

[PATCH 106/159] drm/amdgpu: enable vcn and jpeg on aldebaran

2021-02-24 Thread Alex Deucher
From: James Zhu Enable vcn and jpeg 2.6 on aldebaran. Signed-off-by: James Zhu Reviewed-by: Leo Liu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/soc15.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm

[PATCH 102/159] drm/amdgpu: Add support for cached VRAM in A+A

2021-02-24 Thread Alex Deucher
From: Harish Kasiviswanathan This change was lost in last merge. The upstream commit 672242be560 removed init_mem_type Signed-off-by: Harish Kasiviswanathan Acked-by: Rajneesh Bhardwaj Reviewed-by: Eric Huang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9

[PATCH 098/159] drm/amdgpu: UTLC1 RB SDMA timeout on Aldebaran

2021-02-24 Thread Alex Deucher
From: Alex Sierra [Why] This causes infinite retries on the UTCL1 RB, preventing higher priority RB such as paging RB. [How] Set to one the SDMAx_UTLC1_TIMEOUT registers for all SDMAs. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Reviewed-by: Hawking Zhang Signed-off-by: Alex Deuch

[PATCH 101/159] drm/amd/pm: Set no fan control flag as needed.

2021-02-24 Thread Alex Deucher
From: Lijo Lazar For GPUs that don't support fan control, set the no fan control flag so that they don't appear in hwmon sensors. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 1 file changed, 4 inserti

[PATCH 099/159] drm/amdgpu: Aldebaran doesn't use semaphore

2021-02-24 Thread Alex Deucher
From: Amber Lin Simplify all Aldebaran DIDs into one ASIC type. Signed-off-by: Amber Lin Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.

[PATCH 100/159] drm/amdgpu: bypass hdp read cache invalidation for aldebaran (v2)

2021-02-24 Thread Alex Deucher
From: Hawking Zhang hdp read cache is removed in aldebaran. don't issue an mmio write or write data packet to hardware. v2: rebase Signed-off-by: Hawking Zhang Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c | 3 +++ 1 file changed, 3 insertions(

[PATCH 093/159] drm/amd/pm: Remove CPU virtual address notification in aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar PPSMC_MSG_SetSystemVirtualDramAddrHigh/Low messages are not handled by PMFW in aldebaran Signed-off-by: Lijo Lazar Reviewed-by: Kenneth Feng Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 17 - 1 file

[PATCH 097/159] drm/amdpgu: add ATOM_DGPU_VRAM_TYPE_HBM2E vram type

2021-02-24 Thread Alex Deucher
From: Feifei Xu 0x61 is assigned to HBM2E in atom_dgpu_vram_type. Signed-off-by: Feifei Xu Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 1 + drivers/gpu/drm/amd/include/atomfirmware.h | 1 + 2 files changed, 2 insertions(

[PATCH 074/159] drm/amdgpu: use physical_node_id to calculate aper_base

2021-02-24 Thread Alex Deucher
From: Hawking Zhang Similar as xgmi connected gpu nodes, physical_node_id * segment_size should be used to calculate the offset of aper_base. The asic type check is redundant. once physical_node_id and segment_size are initialized, it should be count on. Signed-off-by: Hawking Zhang Reviewed-b

[PATCH 087/159] drm/amdgpu: pre-map device buffer as cached for A+A config

2021-02-24 Thread Alex Deucher
From: Oak Zeng For A+A configuration, device memory is supposed to be mapped as cachable from CPU side. For kernel pre-map gpu device memory using ioremap_cache Signed-off-by: Oak Zeng Reviewed-by: Christian Koenig Tested-by: Amber Lin Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amd

[PATCH 089/159] drm/amdgpu: Don't do FB resize under A+A config

2021-02-24 Thread Alex Deucher
From: Oak Zeng Disable PCIe BAR resizing on A+A config. It's not needed because we won't use the PCIe BAR, but it breaks the PCI BAR configuration with the current SBIOS. Error message of FB BAR resize failure under A+A: [ 154.913731] [drm:amdgpu_device_resize_fb_bar [amdgpu]] *ERROR* Problem

[PATCH 073/159] drm/amdgpu: skip gds ras workaround for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang there won't be any gds useage in either kernel or pm4 anymore for aldebaran. Signed-off-by: Hawking Zhang Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/dri

[PATCH 096/159] drm/amdgpu: retire aldebaran gpu_info firmware

2021-02-24 Thread Alex Deucher
From: Hawking Zhang driver should use the gfx_info atomfirmware interface Signed-off-by: Hawking Zhang Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH 080/159] drm/amdgpu: add mmhub ras error reset callback for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang The callback will be invoked to reset mmhub ras error counters when needed. Signed-off-by: Hawking Zhang Reviewed-by: Dennis Li Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers

[PATCH 094/159] drm/amdgpu: set snoop bit in pde/pte entries for Aldebaran A+A

2021-02-24 Thread Alex Deucher
From: Eric Huang Page tables in vram mapping to cpu is changed from uncached to cached in A+A, the snoop bit in VM_CONTEXTx_PAGE_TABLE_BASE_ADDR/ PDE0s/PDE1s/PDE2s/PTE.TFs has to be set so gpuvm walker snoop page table data out of CPU cache. Signed-off-by: Eric Huang Reviewed-by: Oak Zeng Revi

[PATCH 082/159] drm/amdgpu: correct IH_CHICKEN programming for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang For aldebaran, psp firmware won't program IH_CHICKEN. it now depends on driver to program it properly so either bus address or gpu virtual address is just working for ih ring. Signed-off-by: Hawking Zhang Acked-by: Christian König Acked-by: Felix Kuehling Reviewed-by: Denn

[PATCH 084/159] drm/amdgpu: disallow use semaphore on aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang shall revisit the change later Signed-off-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index

[PATCH 095/159] drm/amdgpu: query aldebaran gfx_config through atomfirmware i/f

2021-02-24 Thread Alex Deucher
From: Hawking Zhang For ASICs that don't support ip discovery feature, query gfx configuration through atomfirmware interface, rather than gpu_info firmware. Signed-off-by: Hawking Zhang Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher --- .../gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 19

[PATCH 088/159] drm/ttm: ioremap buffer properly according to TTM placement flag

2021-02-24 Thread Alex Deucher
From: Oak Zeng If TTM placement flag is cached, buffer is intended to be mapped as cached from CPU. Map it with ioremap_cache. This wasn't necessary before as device memory was never mapped as cached from CPU side. It becomes necessary for aldebaran as device memory is mapped cached from CPU. S

[PATCH 077/159] drm/amdgpu: add sdma ras error query callback for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang The callback will be invoked to harvest all kinds of sdma ras error Signed-off-by: Hawking Zhang Reviewed-by: Dennis Li Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c | 187 + 1 file changed, 187 insertions(+) diff --git a/d

[PATCH 069/159] drm/amdgpu:return true for mode1_reset_support on aldebaran

2021-02-24 Thread Alex Deucher
From: Feifei Xu Will remove once validation finished. Signed-off-by: Feifei Xu Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt

[PATCH 071/159] drm/amdgpu: correct vram_info for HBM2E

2021-02-24 Thread Alex Deucher
From: Feifei Xu correct atom_vram_info_header_v2_6 and its vram_module. Signed-off-by: Feifei Xu Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm

[PATCH 083/159] drm/amdgpu: switch to vega20 ih block for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang replace vega10 ih block with vega20 ih block for aldebaran. Signed-off-by: Hawking Zhang Acked-by: Christian König Acked-by: Felix Kuehling Reviewed-by: Dennis Li Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/soc15.c | 4 ++-- 1 file

[PATCH 091/159] drm/amd/amdgpu: Add smu_pptable module parameter

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Temporarily add smu_pptable module parameter for aldebaran.This is used to force soft PPTable use overriding any VBIOS PPTable. Signed-off-by: Lijo Lazar Reviewed-by: Kenneth Feng Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h

[PATCH 085/159] drm/amd/pm:add aldebaran support for getting bootup values

2021-02-24 Thread Alex Deucher
From: Feifei Xu for SMU config. Signed-off-by: Feifei Xu Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0

[PATCH 090/159] drm/amd/pm: Add atom_smc_dpm_info_v4_10 for aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Add atom_smc_dpm_info_v4_10 that defines board parameters for aldebaran Signed-off-by: Lijo Lazar Reviewed-by: Kenneth Feng Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/include/atomfirmware.h| 53 +++ .../drm/amd/pm/swsm

[PATCH 081/159] drm/amdgpu: add mmhub error status query callback for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang The callback will be invoked to query mmea error status when needed. Signed-off-by: Hawking Zhang Reviewed-by: Dennis Li Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 27 + 1 file changed, 27 insertions(+) diff --git a/d

[PATCH 086/159] drm/amdgpu: update atom_firmware_info_v3_4 (v2)

2021-02-24 Thread Alex Deucher
From: Feifei Xu v1: Added some pspbl parameters v2: fix fallthrough issue Signed-off-by: Feifei Xu Reviewed-by: Hawking Zhang Reviewed-by: Kevin Wang Reviewed-by: Lazar Lijo Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/include/atomfirmware.h | 5 - drivers/gpu/drm/amd/pm/sws

[PATCH 092/159] drm/amd/pm: Add support to override pptable id for aldebaran

2021-02-24 Thread Alex Deucher
From: Lijo Lazar Temporarily force to use BU PPTable defined in VBIOS. Add support to override PPTable defined by module parameter.Add FW reported version to kernel log. Signed-off-by: Lijo Lazar Reviewed-by: Kenneth Feng Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher --- drivers/gpu/d

[PATCH 079/159] drm/amdgpu: add mmhub ras error query callback for aldebaran

2021-02-24 Thread Alex Deucher
From: Hawking Zhang The callback will be invoked to harvest all kinds of mmhub ras error Signed-off-by: Hawking Zhang Reviewed-by: Dennis Li Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 740 ++-- 1 file changed, 689 insertions(+), 51 deletions(

  1   2   >