Re: [PATCH 1/8] mm: remove a pointless CONFIG_ZONE_DEVICE check in memremap_pages

2022-02-08 Thread Muchun Song
On Mon, Feb 7, 2022 at 2:36 PM Christoph Hellwig wrote: > > memremap.c is only built when CONFIG_ZONE_DEVICE is set, so remove > the superflous extra check. > > Signed-off-by: Christoph Hellwig Reviewed-by: Muchun Song Thanks.

Re: [PATCH 2/8] mm: remove the __KERNEL__ guard from

2022-02-08 Thread Muchun Song
On Mon, Feb 7, 2022 at 2:42 PM Christoph Hellwig wrote: > > __KERNEL__ ifdefs don't make sense outside of include/uapi/. > > Signed-off-by: Christoph Hellwig Reviewed-by: Muchun Song Thanks.

Re: [PATCH 4/8] mm: move free_devmap_managed_page to memremap.c

2022-02-08 Thread Muchun Song
On Mon, Feb 7, 2022 at 2:42 PM Christoph Hellwig wrote: > > free_devmap_managed_page has nothing to do with the code in swap.c, > move it to live with the rest of the code for devmap handling. > > Signed-off-by: Christoph Hellwig Reviewed-by: Muchun Song Thanks.

[PATCH 1/2] drm/amdgpu: add debugfs for reset registers list

2022-02-08 Thread Somalapuram Amaranath
List of register to be populated for dump collection during the GPU reset. Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 60 + 2 files changed, 63 insertions(+) diff --git a/drivers

[PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-08 Thread Somalapuram Amaranath
Dump the list of register values to trace event on GPU reset. Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 - drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 19 +++ 2 files changed, 39 insertions(+), 1 deletion(-) dif

Re: [PATCH v4 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread Christian König
I'm scratching my head what you are doing here? That's the fives time you send out the same patch, so something is going wrong here :) Please double check why that lands in your outbox over and over again. Regards, Christian. Am 08.02.22 um 09:14 schrieb zhanglianjie: after the buffer objec

Re: [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list

2022-02-08 Thread Christian König
Am 08.02.22 um 09:16 schrieb Somalapuram Amaranath: List of register to be populated for dump collection during the GPU reset. Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 60 +

[PATCH v4 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread zhanglianjie
after the buffer object is successfully mapped, call radeon_bo_kunmap before the function returns. Signed-off-by: zhanglianjie Reviewed-by: Christian König diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c index 377f9cdb5b53..0558d928d98d 100644 --- a/driv

[PATCH v3 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread zhanglianjie
after the buffer object is successfully mapped, call radeon_bo_kunmap before the function returns. Signed-off-by: zhanglianjie Reviewed-by: Christian König diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c index 377f9cdb5b53..0558d928d98d 100644 --- a/driv

[RFC] Upstreaming Linux for Nintendo Wii U

2022-02-08 Thread Ash Logan
Hello, I'm the lead dev on a downstream kernel with support for the Wii U[1], Nintendo's previous-gen game console. You might have seen Emmanuel submitting some of the more self-contained drivers recently[2][3]. I've gotten to the point where I'd like to look at upstreaming the platform. Since we

[PATCH v7 1/3] gpu: drm: separate panel orientation property creating and value setting

2022-02-08 Thread Hsin-Yi Wang
drm_dev_register() sets connector->registration_state to DRM_CONNECTOR_REGISTERED and dev->registered to true. If drm_connector_set_panel_orientation() is first called after drm_dev_register(), it will fail several checks and results in following warning. Add a function to create panel orientation

Re: [PATCH 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread zhanglianjie
Hi, Thanks for your review. I have resubmitted, see https://lkml.org/lkml/2022/2/7/2014 Am 29.01.22 um 08:35 schrieb zhanglianjie: after the buffer object is successfully mapped, call radeon_bo_kunmap before the function returns. Signed-off-by: zhanglianjie diff --git a/drivers/gpu/drm/

Re: [Intel-gfx] [PATCH v7 1/3] gpu: drm: separate panel orientation property creating and value setting

2022-02-08 Thread Hsin-Yi Wang
On Tue, Feb 8, 2022 at 3:52 PM Ville Syrjälä wrote: > > On Tue, Feb 08, 2022 at 03:37:12PM +0800, Hsin-Yi Wang wrote: > > +int drm_connector_init_panel_orientation_property( > > + struct drm_connector *connector) > > +{ > > + struct drm_device *dev = connector->dev; > > + struct drm_pr

[PATCH v3 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread zhanglianjie
after the buffer object is successfully mapped, call radeon_bo_kunmap before the function returns. Signed-off-by: zhanglianjie Reviewed-by: Christian K??nig diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c index 377f9cdb5b53..0558d928d98d 100644 --- a/dri

[PATCH v7 2/3] drm/mediatek: init panel orientation property

2022-02-08 Thread Hsin-Yi Wang
Init panel orientation property after connector is initialized. Let the panel driver decides the orientation value later. Signed-off-by: Hsin-Yi Wang Acked-by: Chun-Kuang Hu --- drivers/gpu/drm/mediatek/mtk_dsi.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/mediate

[PATCH v2 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread zhanglianjie
after the buffer object is successfully mapped, call radeon_bo_kunmap before the function returns. Signed-off-by: zhanglianjie diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c index 377f9cdb5b53..0558d928d98d 100644 --- a/drivers/gpu/drm/radeon/radeon_uvd.

[PATCH v7 3/3] arm64: dts: mt8183: Add panel rotation

2022-02-08 Thread Hsin-Yi Wang
krane, kakadu, and kodama boards have a default panel rotation. Signed-off-by: Hsin-Yi Wang Reviewed-by: Enric Balletbo i Serra Tested-by: Enric Balletbo i Serra --- arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/mediatek/

Re: [PATCH v4 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread Christian König
I think so, Alex will probably pick that up. Thanks, Christian. Am 08.02.22 um 09:28 schrieb zhanglianjie: I am very sorry that I submitted many times due to the character coding problem. Can PATCH V4 be used? I'm scratching my head what you are doing here? That's the fives time you send ou

Re: Minimal GPU setup

2022-02-08 Thread Amol
Thank you Christian. On 06/02/2022, Christian König wrote: > Hi Amol, > > Am 05.02.22 um 10:47 schrieb Amol: . . . >> Is posting the BIOS and loading the microcode enough to get me started >> with running basic tasks (DMA transfers, simple packet processing, etc.)? > > Well yes and no. As bare

Re: Minimal GPU setup

2022-02-08 Thread Amol
Thank you Alex. On 07/02/2022, Deucher, Alexander wrote: > [AMD Official Use Only] > > Most of the register programming in evergreen_gpu_init is required. That > code handles things like harvesting (e.g., disabling bad hardware resources) > and setting sane asic specific settings in some registe

Re: [PATCH 2/7] drm/selftests: add drm buddy alloc limit testcase

2022-02-08 Thread Matthew Auld
On 03/02/2022 13:32, Arunpravin wrote: add a test to check the maximum allocation limit Signed-off-by: Arunpravin --- .../gpu/drm/selftests/drm_buddy_selftests.h | 1 + drivers/gpu/drm/selftests/test-drm_buddy.c| 60 +++ 2 files changed, 61 insertions(+) diff --git a

Re: [PATCH 3/7] drm/selftests: add drm buddy alloc range testcase

2022-02-08 Thread Matthew Auld
On 03/02/2022 13:32, Arunpravin wrote: - add a test to check the range allocation - export get_buddy() function in drm_buddy.c - export drm_prandom_u32_max_state() in lib/drm_random.c - include helper functions - include prime number header file Signed-off-by: Arunpravin --- drivers/gpu/drm/d

Re: [PATCH 4/7] drm/selftests: add drm buddy optimistic testcase

2022-02-08 Thread Matthew Auld
On 03/02/2022 13:32, Arunpravin wrote: create a mm with one block of each order available, and try to allocate them all. Signed-off-by: Arunpravin Reviewed-by: Matthew Auld

Re: [PATCH 5/7] drm/selftests: add drm buddy pessimistic testcase

2022-02-08 Thread Matthew Auld
On 03/02/2022 13:32, Arunpravin wrote: create a pot-sized mm, then allocate one of each possible order within. This should leave the mm with exactly one page left. Signed-off-by: Arunpravin --- .../gpu/drm/selftests/drm_buddy_selftests.h | 1 + drivers/gpu/drm/selftests/test-drm_buddy.c

Re: [PATCH 6/7] drm/selftests: add drm buddy smoke testcase

2022-02-08 Thread Matthew Auld
On 03/02/2022 13:32, Arunpravin wrote: - add a test to ascertain that the critical functionalities of the program is working fine - add a timeout helper function Signed-off-by: Arunpravin --- .../gpu/drm/selftests/drm_buddy_selftests.h | 1 + drivers/gpu/drm/selftests/test-drm_buddy.c

Re: [PATCH 7/7] drm/selftests: add drm buddy pathological testcase

2022-02-08 Thread Matthew Auld
On 03/02/2022 13:32, Arunpravin wrote: create a pot-sized mm, then allocate one of each possible order within. This should leave the mm with exactly one page left. Free the largest block, then whittle down again. Eventually we will have a fully 50% fragmented mm. Signed-off-by: Arunpravin ---

Re: [PATCH 1/7] drm/selftests: Move i915 buddy selftests into drm

2022-02-08 Thread Matthew Auld
On 03/02/2022 13:32, Arunpravin wrote: - move i915 buddy selftests into drm selftests folder - add Makefile and Kconfig support - add sanitycheck testcase Prerequisites - These series of selftests patches are created on top of drm buddy series - Enable kselftests for DRM as a module in .confi

Re: [RFC v3 10/12] drm/amdgpu: Move in_gpu_reset into reset_domain

2022-02-08 Thread Lazar, Lijo
On 1/26/2022 4:07 AM, Andrey Grodzovsky wrote: We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky Suggested-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++--- dr

RE: [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list

2022-02-08 Thread Sharma, Shashank
I thought we spoke and agreed about: - Not doing dynamic memory allocation during a reset call, - Not doing string operations, but just dumping register values by index. NACK ! - Shashank -Original Message- From: Somalapuram, Amaranath Sent: Tuesday, February 8, 2022 9:17 AM To: amd-g

Re: [PATCH v11 5/5] drm/amdgpu: add drm buddy support to amdgpu

2022-02-08 Thread Arunpravin
On 04/02/22 6:53 pm, Christian König wrote: > Am 04.02.22 um 12:22 schrieb Arunpravin: >> On 28/01/22 7:48 pm, Matthew Auld wrote: >>> On Thu, 27 Jan 2022 at 14:11, Arunpravin >>> wrote: - Remove drm_mm references and replace with drm buddy functionalities - Add res cursor support for

Re: [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list

2022-02-08 Thread Sharma, Shashank
Amar, Apart from the long comment,there are a few more bugs in the patch, which I have mentioned here inline. Please check them out. - Shashank On 2/8/2022 9:18 AM, Christian König wrote: Am 08.02.22 um 09:16 schrieb Somalapuram Amaranath: List of register to be populated for dump collectio

Re: [RFC v4] drm/amdgpu: Rework reset domain to be refcounted.

2022-02-08 Thread Lazar, Lijo
On 2/2/2022 10:56 PM, Andrey Grodzovsky wrote: The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each membe

RE: [PATCH] drm/amd/pm: correct hwmon power lable name

2022-02-08 Thread Chen, Guchun
[Public] A typo in subject, s/lable/label. With that addressed, the patch is: Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Yang Wang Sent: Tuesday, February 8, 2022 3:44 PM To: amd-gfx@lists.freedesktop.org Cc: Hou, Xiaomeng (Matthew) ; Lazar,

Re: [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list

2022-02-08 Thread Somalapuram, Amaranath
On 2/8/2022 4:43 PM, Sharma, Shashank wrote: I thought we spoke and agreed about: - Not doing dynamic memory allocation during a reset call, as there is a redesign debugfs call will happen during the application initialization and not during reset. - Not doing string operations, but just dum

Re: [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list

2022-02-08 Thread Sharma, Shashank
On 2/8/2022 2:39 PM, Somalapuram, Amaranath wrote: On 2/8/2022 4:43 PM, Sharma, Shashank wrote: I thought we spoke and agreed about: - Not doing dynamic memory allocation during a reset call, as there is a redesign debugfs call will happen during the application initialization and not duri

Re: [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list

2022-02-08 Thread Sharma, Shashank
>> User only update the list of reg offsets on init, there is no >> predefined reg offset from kernel code. I missed this comment in the last patch, and this makes me a bit confused. During the design phase, did we agree to have this whole list loaded from user ? which means that if user doesn'

Re: [PATCH v4 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread zhanglianjie
I am very sorry that I submitted many times due to the character coding problem. Can PATCH V4 be used? I'm scratching my head what you are doing here? That's the fives time you send out the same patch, so something is going wrong here :) Please double check why that lands in your outbox ove

Re: [PATCH v4 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread zhanglianjie
Thank you very much for your review. I think so, Alex will probably pick that up. Thanks, Christian. Am 08.02.22 um 09:28 schrieb zhanglianjie: I am very sorry that I submitted many times due to the character coding problem. Can PATCH V4 be used? I'm scratching my head what you are doing h

[PATCH v8 1/3] gpu: drm: separate panel orientation property creating and value setting

2022-02-08 Thread Hsin-Yi Wang
drm_dev_register() sets connector->registration_state to DRM_CONNECTOR_REGISTERED and dev->registered to true. If drm_connector_set_panel_orientation() is first called after drm_dev_register(), it will fail several checks and results in following warning. Add a function to create panel orientation

[PATCH v8 2/3] drm/mediatek: init panel orientation property

2022-02-08 Thread Hsin-Yi Wang
Init panel orientation property after connector is initialized. Let the panel driver decides the orientation value later. Signed-off-by: Hsin-Yi Wang Acked-by: Chun-Kuang Hu --- drivers/gpu/drm/mediatek/mtk_dsi.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/mediate

[PATCH v8 3/3] arm64: dts: mt8183: Add panel rotation

2022-02-08 Thread Hsin-Yi Wang
krane, kakadu, and kodama boards have a default panel rotation. Signed-off-by: Hsin-Yi Wang Reviewed-by: Enric Balletbo i Serra Tested-by: Enric Balletbo i Serra --- arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/mediatek/

Re: [PATCH v2] drm/amd/pm: fix hwmon node of power1_label create issue

2022-02-08 Thread Alex Deucher
Reviewed-by: Alex Deucher On Tue, Feb 8, 2022 at 2:09 AM Yang Wang wrote: > > it will cause hwmon node of power1_label is not created. > > v2: > the hwmon node of "power1_lable" is always needed for all ASICs. > and the patch will remove ASIC type check for "power1_label". > > Fixes: ae07970a06

Re: [PATCH 7/7] drm/amd/pm: fix some OEM SKU specific stability issues

2022-02-08 Thread Deucher, Alexander
[Public] Series is: Reviewed-by: Alex Deucher From: Quan, Evan Sent: Monday, February 7, 2022 10:20 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Quan, Evan Subject: [PATCH 7/7] drm/amd/pm: fix some OEM SKU specific stability issues Add a quir

Re: [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list

2022-02-08 Thread Sharma, Shashank
Based on confirmation from Christian, it seems my understanding of the design was not correct, and user must add a list of registers to dump. That resolves most of my comments automatically, @Amar, please fix a max register condition in the loop, to handle the negative testing case and the uin

Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-08 Thread Alex Deucher
On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath wrote: > > Dump the list of register values to trace event on GPU reset. > > Signed-off-by: Somalapuram Amaranath > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 - > drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 19

Re: [RFC v3 06/12] drm/amdgpu: Drop hive->in_reset

2022-02-08 Thread Andrey Grodzovsky
On 2022-02-08 01:33, Lazar, Lijo wrote: On 1/26/2022 4:07 AM, Andrey Grodzovsky wrote: Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König ---   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +

Re: [RFC v4] drm/amdgpu: Rework reset domain to be refcounted.

2022-02-08 Thread Andrey Grodzovsky
On 2022-02-08 06:25, Lazar, Lijo wrote: On 2/2/2022 10:56 PM, Andrey Grodzovsky wrote: The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset d

Re: [PATCH v4 2/2] drm/radeon/uvd: Fix forgotten unmap buffer objects

2022-02-08 Thread Alex Deucher
Applied the series. Thanks! Alex On Tue, Feb 8, 2022 at 3:33 AM Christian König wrote: > > I think so, Alex will probably pick that up. > > Thanks, > Christian. > > Am 08.02.22 um 09:28 schrieb zhanglianjie: > > I am very sorry that I submitted many times due to the character > > coding problem

Re: [PATCH V3 4/7] drm/amd/pm: correct the usage for 'supported' member of smu_feature structure

2022-02-08 Thread Nathan Chancellor
Hi Evan, On Fri, Jan 28, 2022 at 03:04:52PM +0800, Evan Quan wrote: > The supported features should be retrieved just after EnableAllDpmFeatures > message > complete. And the check(whether some dpm feature is supported) is only needed > when we > decide to enable or disable it. > > Signed-off-b

freedesktop

2022-02-08 Thread Anthony Liu
Dear Manager, (Please forward this to your CEO, because this is urgent. Thanks!) My name is Anthony Liu, Operating Manager of a Network Service Company which is the domain name registration center in Shanghai, China. On February 7, 2022, we received an application from Dengdisi Holdings Ltd requ

Re: [PATCH 6/8] mm: don't include in

2022-02-08 Thread Dan Williams
On Mon, Feb 7, 2022 at 3:49 PM Dan Williams wrote: > > On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig wrote: > > > > Move the check for the actual pgmap types that need the free at refcount > > one behavior into the out of line helper, and thus avoid the need to > > pull memremap.h into mm.h.

[RFC v4 00/11] Define and use reset domain for GPU recovery in amdgpu

2022-02-08 Thread Andrey Grodzovsky
This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same o

[RFC v4 01/11] drm/amdgpu: Introduce reset domain

2022-02-08 Thread Andrey Grodzovsky
Defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another. Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky Suggested-by: Daniel Vetter Suggested-by: Christian König Reviewed-by: Christian Kön

[RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-08 Thread Andrey Grodzovsky
Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/d

[RFC v4 03/11] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2022-02-08 Thread Andrey Grodzovsky
Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeue

[RFC v4 04/11] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.

2022-02-08 Thread Andrey Grodzovsky
No need to to trigger another work queue inside the work queue. v3: Problem: Extra reset caused by host side FLR notification following guest side triggered reset. Fix: Preven qeuing flr_work from mailbox irq if guest already executing a reset. Suggested-by: Liu Shaoyun Signed-off-by: Andrey Gr

[RFC v4 05/11] drm/amdgpu: Drop hive->in_reset

2022-02-08 Thread Andrey Grodzovsky
Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_xg

[RFC v4 09/11] drm/amdgpu: Move in_gpu_reset into reset_domain

2022-02-08 Thread Andrey Grodzovsky
We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky Suggested-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + driver

[RFC v4 10/11] drm/amdgpu: Rework amdgpu_device_lock_adev

2022-02-08 Thread Andrey Grodzovsky
This functions needs to be split into 2 parts where one is called only once for locking single instance of reset_domain's sem and reset flag and the other part which handles MP1 states should still be called for each device in XGMI hive. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/a

[RFC v4 08/11] drm/amdgpu: Move reset sem into reset_domain

2022-02-08 Thread Andrey Grodzovsky
We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - drivers/gpu/drm/amd/amdg

[RFC v4 06/11] drm/amdgpu: Drop concurrent GPU reset protection for device

2022-02-08 Thread Andrey Grodzovsky
Since now all GPU resets are serialzied there is no need for this. This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout' Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++ 1 file chang

[RFC v4 11/11] Revert 'drm/amdgpu: annotate a false positive recursive locking'

2022-02-08 Thread Andrey Grodzovsky
Since we have a single instance of reset semaphore which we lock only once even for XGMI hive we don't need the nested locking hint anymore. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-) diff

[RFC v4 07/11] drm/amdgpu: Rework reset domain to be refcounted.

2022-02-08 Thread Andrey Grodzovsky
The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash o

RE: [RFC v4 04/11] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.

2022-02-08 Thread Liu, Shaoyun
[AMD Official Use Only] This patch is reviewed by Shaoyun.liu Since other patches are suggested by other engineer and they may already od some review on them , so I will leave them to continue review the rest patches. Regards Shaoyun.liu -Original Message- From: Grodzovsky,

Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-08 Thread Dan Williams
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig wrote: [..] > @@ -500,28 +482,27 @@ void free_devmap_managed_page(struct page *page) > */ > page->mapping = NULL; > page->pgmap->ops->page_free(page); > + > + /* > +* Reset the page count to 1 to prepare for h

[PATCH 01/11] drm/amdgpu: Optimize xxx_ras_late_init/xxx_ras_late_fini for each ras block

2022-02-08 Thread yipechai
1. Define amdgpu_ras_block_late_init to create sysfs nodes and interrupt handles. 2. Define amdgpu_ras_block_late_fini to remove sysfs nodes and interrupt handles. 3. Replace ras block variable members in struct amdgpu_ras_block_object with struct ras_common_if, which can makes it easy

[PATCH 02/11] drm/amdgpu: Optimize amdgpu_gfx_ras_late_init/amdgpu_gfx_ras_fini function code

2022-02-08 Thread yipechai
Optimize amdgpu_gfx_ras_late_init/amdgpu_gfx_ras_fini function code. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 42 +++-- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 2 files changed, 11 insertions(+), 37 deletions(-) diff --git a/drivers/gpu

[PATCH 03/11] drm/amdgpu: Optimize amdgpu_hdp_ras_late_init/amdgpu_hdp_ras_fini function code

2022-02-08 Thread yipechai
Optimize amdgpu_hdp_ras_late_init/amdgpu_hdp_ras_fini function code. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.c | 37 ++--- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c | 1 + 3 files changed, 5 insertions(+

[PATCH 04/11] drm/amdgpu: Optimize amdgpu_mca_ras_late_init/amdgpu_mca_ras_fini function code

2022-02-08 Thread yipechai
Optimize amdgpu_mca_ras_late_init/amdgpu_mca_ras_fini function code. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 41 ++--- drivers/gpu/drm/amd/amdgpu/mca_v3_0.c | 6 2 files changed, 8 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/

[PATCH 05/11] drm/amdgpu: Optimize amdgpu_mmhub_ras_late_init/amdgpu_mmhub_ras_fini function code

2022-02-08 Thread yipechai
Optimize amdgpu_mmhub_ras_late_init/amdgpu_mmhub_ras_fini function code. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.c | 37 ++- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 ++ 2 files changed, 5 insertions(+), 34 deletions(-) diff --git a/drivers/

[PATCH 09/11] drm/amdgpu: Optimize amdgpu_xgmi_ras_late_init/amdgpu_xgmi_ras_fini function code

2022-02-08 Thread yipechai
Optimize amdgpu_xgmi_ras_late_init/amdgpu_xgmi_ras_fini function code. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 40 +++- 2 files changed, 6 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/

[PATCH 06/11] drm/amdgpu: Optimize amdgpu_nbio_ras_late_init/amdgpu_nbio_ras_fini function code

2022-02-08 Thread yipechai
Optimize amdgpu_nbio_ras_late_init/amdgpu_nbio_ras_fini function code. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 40 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 + 3 files changed, 7 insertio

[PATCH 07/11] drm/amdgpu: Optimize amdgpu_sdma_ras_late_init/amdgpu_sdma_ras_fini function code

2022-02-08 Thread yipechai
Optimize amdgpu_sdma_ras_late_init/amdgpu_sdma_ras_fini function code. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 46 +++- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 12 --- 2 files changed, 13 insertions(+), 45 deletions(-) diff --git a/drive

[PATCH 08/11] drm/amdgpu: Optimize amdgpu_umc_ras_late_init/amdgpu_umc_ras_fini function code

2022-02-08 Thread yipechai
Optimize amdgpu_umc_ras_late_init/amdgpu_umc_ras_fini function code. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 44 - drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 +++ drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 3 files changed, 16 insert

[PATCH 10/11] drm/amdgpu: Optimize operating sysfs and interrupt function interface in amdgpu_ras.c

2022-02-08 Thread yipechai
In order to reduce redundant struct conversion, modify operating sysfs and interrupt function interface parameters. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37 - drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 6 ++-- 2 files changed, 21 insertion

[PATCH 11/11] drm/amdgpu: Merge amdgpu_ras_late_init/amdgpu_ras_late_fini to amdgpu_ras_block_late_init/amdgpu_ras_block_late_fini

2022-02-08 Thread yipechai
1. Merge amdgpu_ras_late_init to amdgpu_ras_block_late_init. 2. Remove amdgpu_ras_late_init since no ras block calls amdgpu_ras_late_init. 3. Merge amdgpu_ras_late_fini to amdgpu_ras_block_late_fini. 4. Remove amdgpu_ras_late_fini since no ras block calls amdgpu_ras_late_fini. Signed-o

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-02-08 Thread JingWen Chen
Hi Andrey, I have been testing your patch and it seems fine till now. Best Regards, Jingwen Chen On 2022/2/3 上午2:57, Andrey Grodzovsky wrote: > Just another ping, with Shyun's help I was able to do some smoke testing on > XGMI SRIOV system (booting and triggering hive reset) > and for now look

Re: [PATCH 6/8] mm: don't include in

2022-02-08 Thread Christoph Hellwig
On Tue, Feb 08, 2022 at 03:53:14PM -0800, Dan Williams wrote: > Yeah, same as Logan: > > mm/memcontrol.c: In function ‘get_mctgt_type’: > mm/memcontrol.c:5724:29: error: implicit declaration of function > ‘is_device_private_page’; did you mean > ‘is_device_private_entry’? [-Werror=implicit-functio

Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-08 Thread Christoph Hellwig
On Tue, Feb 08, 2022 at 07:30:11PM -0800, Dan Williams wrote: > Interesting. I had expected that to really fix the refcount problem > that fs/dax.c would need to start taking real page references as pages > were added to a mapping, just like page cache. I think we should do that eventually. But I

Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-08 Thread Christian König
Am 08.02.22 um 16:28 schrieb Alex Deucher: On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath wrote: Dump the list of register values to trace event on GPU reset. Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 - drivers/gpu/d

Re: [RFC v4] drm/amdgpu: Rework reset domain to be refcounted.

2022-02-08 Thread Christian König
Am 08.02.22 um 17:19 schrieb Andrey Grodzovsky: On 2022-02-08 06:25, Lazar, Lijo wrote: On 2/2/2022 10:56 PM, Andrey Grodzovsky wrote: The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to

Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-08 Thread Christian König
Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Si

Re: [RFC v4 04/11] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.

2022-02-08 Thread Christian König
Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: No need to to trigger another work queue inside the work queue. v3: Problem: Extra reset caused by host side FLR notification following guest side triggered reset. Fix: Preven qeuing flr_work from mailbox irq if guest already executing a reset.

Re: [RFC v4 07/11] drm/amdgpu: Rework reset domain to be refcounted.

2022-02-08 Thread Christian König
Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member

Re: [RFC v4 08/11] drm/amdgpu: Move reset sem into reset_domain

2022-02-08 Thread Christian König
Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König