Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-08-30 Thread Tejun Heo
Hello, I just glanced through the interface and don't have enough context to give any kind of detailed review yet. I'll try to read up and understand more and would greatly appreciate if you can give me some pointers to read up on the resources being controlled and how the actual use cases would

Re: [PATCH v3 3/3] dmr/amdgpu: Add system auto reboot to RAS.

2019-08-30 Thread Grodzovsky, Andrey
But I am not the one cherry-picking to DKMS, should I just let this person know this is the DKMS code he should use for when appropriate API doesn't exist ? Andrey From: Alex Deucher Sent: 30 August 2019 15:55:03 To: Grodzovsky, Andrey Cc: amd-gfx list;

[pull] amdgpu drm-next-5.4

2019-08-30 Thread Alex Deucher
Hi Dave, Daniel, Mostly bug fixes. The big addition is display support for renoir which is new for 5.4. I realize it's a bit late to add it but the rest of the code for renoir is already in so it would be nice to get the display part in as well. If not, let me know, and I'll respin without it.

Re: [PATCH 4/4] drm/amdgpu: Use optimal mtypes and PTE bits for Arcturus

2019-08-30 Thread Liu, Shaoyun
Serials are reviewed by :  shaoyunl Looks like a little bit confusing that we have  two place for the pte flags .  get_pte_flags  already get asic specific mapping flags  and  inside amdgpu_vm_bo_split_mapping , driver adjust the real HW mapping flags again .  Maybe  better just keep the logic

Re: [PATCH v3 2/3] dmr/amdgpu: Avoid HW GPU reset for RAS.

2019-08-30 Thread Kuehling, Felix
On 2019-08-30 12:39 p.m., Andrey Grodzovsky wrote: > Problem: > Under certain conditions, when some IP bocks take a RAS error, > we can get into a situation where a GPU reset is not possible > due to issues in RAS in SMU/PSP. > > Temporary fix until proper solution in PSP/SMU is ready: > When uncor

Re: [PATCH v3 1/3] drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case.

2019-08-30 Thread Kuehling, Felix
On 2019-08-30 12:39 p.m., Andrey Grodzovsky wrote: > Issue 1: > In XGMI case amdgpu_device_lock_adev for other devices in hive > was called to late, after access to their repsective schedulers. > So relocate the lock to the begining of accessing the other devs. > > Issue 2: > Using amdgpu_device_i

Re: [PATCH v3 2/3] dmr/amdgpu: Avoid HW GPU reset for RAS.

2019-08-30 Thread Alex Deucher
On Fri, Aug 30, 2019 at 12:39 PM Andrey Grodzovsky wrote: > > Problem: > Under certain conditions, when some IP bocks take a RAS error, > we can get into a situation where a GPU reset is not possible > due to issues in RAS in SMU/PSP. > > Temporary fix until proper solution in PSP/SMU is ready: >

Re: [PATCH v3 3/3] dmr/amdgpu: Add system auto reboot to RAS.

2019-08-30 Thread Alex Deucher
On Fri, Aug 30, 2019 at 12:39 PM Andrey Grodzovsky wrote: > > In case of RAS error allow user configure auto system > reboot through ras_ctrl. > This is also part of the temproray work around for the RAS > hang problem. > > Signed-off-by: Andrey Grodzovsky Typo in title: dmr -> drm > --- > dri

[PATCH 1/3] drm/amd: be quiet when no SAD block is found

2019-08-30 Thread Jean Delvare
It is fine for displays without audio functionality to not provide any SAD block in their EDID. Do not log an error in that case, just return quietly. This fixes half of bug fdo#107825: https://bugs.freedesktop.org/show_bug.cgi?id=107825 Signed-off-by: Jean Delvare Cc: Alex Deucher Cc: "Christi

[PATCH 3/3] drm/edid: no CEA extension is not an error

2019-08-30 Thread Jean Delvare
It is fine for displays without audio functionality to not implement CEA extension in their EDID. Do not return an error in that case, instead return 0 as if there was a CEA extension with no audio or speaker block. This fixes half of bug fdo#107825: https://bugs.freedesktop.org/show_bug.cgi?id=10

Re: [PATCH 2/3] drm/radeon: be quiet when no SAD block is found

2019-08-30 Thread Jean Delvare
Oops, sorry, I messed up the subject line of that one, which should really read "drm/radeon: be quiet when no SAD block is found". -- Jean Delvare SUSE L3 Support ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailma

[PATCH 2/3] drm/edid: don't log errors on absent CEA SAD blocks

2019-08-30 Thread Jean Delvare
It is fine for displays without audio functionality to not provide any SAD block in their EDID. Do not log an error in that case, just return quietly. Inspired by a similar fix to the amdgpu driver in the context of bug fdo#107825: https://bugs.freedesktop.org/show_bug.cgi?id=107825 Signed-off-by

[PATCH 0/3] drm/edid: don't log errors on absent CEA SAD blocks

2019-08-30 Thread Jean Delvare
Hi all, This is my attempt to fix bug fdo#107825: https://bugs.freedesktop.org/show_bug.cgi?id=107825 [PATCH 1/3] drm/amd: be quiet when no SAD block is found [PATCH 2/3] drm/radeon: be quiet when no SAD block is found [PATCH 3/3] drm/edid: no CEA extension is not an error -- Jean Delvare SUSE

[PATCH v3 3/3] dmr/amdgpu: Add system auto reboot to RAS.

2019-08-30 Thread Andrey Grodzovsky
In case of RAS error allow user configure auto system reboot through ras_ctrl. This is also part of the temproray work around for the RAS hang problem. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

[PATCH v3 2/3] dmr/amdgpu: Avoid HW GPU reset for RAS.

2019-08-30 Thread Andrey Grodzovsky
Problem: Under certain conditions, when some IP bocks take a RAS error, we can get into a situation where a GPU reset is not possible due to issues in RAS in SMU/PSP. Temporary fix until proper solution in PSP/SMU is ready: When uncorrectable error happens the DF will unconditionally broadcast err

[PATCH v3 1/3] drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case.

2019-08-30 Thread Andrey Grodzovsky
Issue 1: In XGMI case amdgpu_device_lock_adev for other devices in hive was called to late, after access to their repsective schedulers. So relocate the lock to the begining of accessing the other devs. Issue 2: Using amdgpu_device_ip_need_full_reset to switch the device list from all devices in

Re: [PATCH] drm/amd/powerplay: guard manual mode prerequisite for clock level force

2019-08-30 Thread Alex Deucher
On Fri, Aug 30, 2019 at 5:37 AM Evan Quan wrote: > > Force clock level is for dpm manual mode only. > > Change-Id: I3b4caf3fafc72197d65e2b9255c68e40e673e25e > Reported-by: Candice Li > Signed-off-by: Evan Quan Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 18

Re: [PATCH] drm/amdgpu: Move null pointer dereference check

2019-08-30 Thread Alex Deucher
On Fri, Aug 30, 2019 at 8:43 AM Austin Kim wrote: > > Null pointer dereference check should have been checked, > ahead of below routine. > struct amdgpu_device *adev = hwmgr->adev; > > With this commit, it could avoid potential NULL dereference. > > Signed-off-by: Austin Kim Applied. th

Re: [PATCH 4/4] drm/amdgpu: move the call of ras recovery_init and bad page reserve to proper place

2019-08-30 Thread Grodzovsky, Andrey
On 8/30/19 8:24 AM, Tao Zhou wrote: > ras recovery_init should be called after ttm and smu init, > bad page reserve should be put in front of gpu reset since i2c > may be unstable during gpu reset > add cleanup for recovery_init and recovery_fini > > Signed-off-by: Tao Zhou > --- > drivers/gpu/

RE: [PATCH V2] drm/amd/powerplay: SMU_MSG_OverridePcieParameters is unsupport for APU

2019-08-30 Thread Huang, Ray
Acked-by: Huang Rui -Original Message- From: Aaron Liu Sent: Thursday, August 29, 2019 10:10 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, Ray ; Liu, Aaron Subject: [PATCH V2] drm/amd/powerplay: SMU_MSG_OverridePcieParameters is unsupport for APU For apu, SMU_

RE: [PATCH 2/2] drm/amdgpu: fix no interrupt issue for renoir emu (v2)

2019-08-30 Thread Huang, Ray
Patches are Acked-by: Huang Rui Please use "git send-email" on amd-gfx public code review. Thanks, Ray -Original Message- From: Aaron Liu Sent: Friday, August 30, 2019 1:33 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, Ray ; Liu, Aaron ; Deucher, Alexander Su

Re: [PATCH v3 6/7] drm/amdgpu: utilize subconnector property for DP through atombios

2019-08-30 Thread Ville Syrjälä
On Thu, Aug 29, 2019 at 08:52:31AM -0400, Alex Deucher wrote: > On Mon, Aug 26, 2019 at 9:22 AM Oleg Vasilev wrote: > > > > Since DP-specific information is stored in driver's structures, every > > driver needs to implement subconnector property by itself. > > > > Reviewed-by: Emil Velikov > > Si

[PATCH] drm/amd/powerplay: Variable ps could be NULL when it get dereferenced

2019-08-30 Thread Yizhuo
Inside function cz_get_performance_level(), pointer ps could be NULL via cast_const_PhwCzPowerState(). However, this pointer is dereferenced without any check, which is potentially unsafe. Signed-off-by: Yizhuo --- drivers/gpu/drm/amd/powerplay/hwmgr/cz_hwmgr.c | 3 +++ 1 file changed, 3 inserti

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-08-30 Thread Hillf Danton
On Fri, 30 Aug 2019 06:04:06 +0800 Mikhail Gavrilov wrote: > On Sun, Aug 25, 2019 at 10:13:05PM +0800, Hillf Danton wrote: > > Can we try to add the fallback timer manually? > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > @@ -322,6 +3

[PATCH] drm/amdgpu: Move null pointer dereference check

2019-08-30 Thread Austin Kim
Null pointer dereference check should have been checked, ahead of below routine. struct amdgpu_device *adev = hwmgr->adev; With this commit, it could avoid potential NULL dereference. Signed-off-by: Austin Kim --- drivers/gpu/drm/amd/powerplay/smumgr/smu8_smumgr.c | 5 +++-- 1 file chan

[PATCH 4/4] drm/amdgpu: move the call of ras recovery_init and bad page reserve to proper place

2019-08-30 Thread Tao Zhou
ras recovery_init should be called after ttm and smu init, bad page reserve should be put in front of gpu reset since i2c may be unstable during gpu reset add cleanup for recovery_init and recovery_fini Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 + dr

[PATCH 3/4] drm/amdgpu: save umc error records

2019-08-30 Thread Tao Zhou
save umc error records to ras bad page array v2: add bad pages before gpu reset Signed-off-by: Tao Zhou Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 29 ++ drivers/gpu/drm/amd/amdgpu/umc_v6_1.c

[PATCH 2/4] drm/amdgpu: Hook EEPROM table to RAS

2019-08-30 Thread Tao Zhou
support eeprom records load and save for ras, move EEPROM records storing to bad page reserving Signed-off-by: Tao Zhou Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 111 ++-- 1 file changed, 83 insertions(+), 28 deletions(-) diff --git a/dr

[PATCH 1/4] drm/amdgpu: change ras bps type to eeprom table record structure

2019-08-30 Thread Tao Zhou
change bps type from retired page to eeprom table record, prepare for saving error records to eeprom Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 59 - drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 11 +++-- 2 files changed, 43 insertions(+), 27 delet

[PATCH 0/4] add support for ras page retirement

2019-08-30 Thread Tao Zhou
This series saves umc error page info into a record structure and stores records to eeprom, it also loads error records from eeprom and reservers related retired pages during gpu init. Tao Zhou (4): drm/amdgpu: change ras bps type to eeprom table record structure drm/amdgpu: Hook EEPROM table

[PATCH] drm/amd/powerplay: guard manual mode prerequisite for clock level force

2019-08-30 Thread Evan Quan
Force clock level is for dpm manual mode only. Change-Id: I3b4caf3fafc72197d65e2b9255c68e40e673e25e Reported-by: Candice Li Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 18 ++ drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h | 5 +++-- drivers/gpu

[PATCH v2 2/2] drm/amdgpu: fix no interrupt issue for renoir emu (v2)

2019-08-30 Thread Aaron Liu
In renoir's vega10_ih model, there's a security change in mmIH_CHICKEN register, that limits IH to use physical address (FBPA, GPA) directly. Those chicken bits need to be programmed first. Signed-off-by: Aaron Liu Reviewed-by: Huang Rui Reviewed-by: Hawking Zhang Acked-by: Alex Deucher Signed

[PATCH v2 1/2] drm/amdgpu: update IH_CHICKEN in oss 4.0 IP header for VG/RV series

2019-08-30 Thread Aaron Liu
In Renoir's emulator, those chicken bits need to be programmed. Signed-off-by: Aaron Liu Reviewed-by: Huang Rui Reviewed-by: Hawking Zhang Acked-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/include/asic_reg/oss/osssys_4_0_sh_mask.h | 4 1 file changed, 4 insertio

Re: [PATCH 1/2] drm/amdgpu: Remove unnecessary TLB workaround

2019-08-30 Thread Christian König
Am 30.08.19 um 07:14 schrieb Kuehling, Felix: This workaround is better handled in user mode in a way that doesn't require allocating extra memory and breaking userptr BOs. The TLB bug is a performance bug, not a functional or security bug. Hence it is safe to remove this kernel part of the work