Re: [RFC v4 09/11] drm/amdgpu: Move in_gpu_reset into reset_domain

2022-02-09 Thread Christian König
Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky Suggested-by: Lijo Lazar Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 7 ++- drivers/gpu/drm/amd/amdgpu/amdg

Re: [RFC v4 10/11] drm/amdgpu: Rework amdgpu_device_lock_adev

2022-02-09 Thread Christian König
Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: This functions needs to be split into 2 parts where one is called only once for locking single instance of reset_domain's sem and reset flag and the other part which handles MP1 states should still be called for each device in XGMI hive. Signed-off

Re: [RFC v4 11/11] Revert 'drm/amdgpu: annotate a false positive recursive locking'

2022-02-09 Thread Christian König
Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: Since we have a single instance of reset semaphore which we lock only once even for XGMI hive we don't need the nested locking hint anymore. Signed-off-by: Andrey Grodzovsky Oh, yes please :) Reviewed-by: Christian König --- drivers/gpu/dr

drm/ttm: moving the LRU into the resource

2022-02-09 Thread Christian König
Hi guys, so hopefully the last round for this set. It fixes both a long outstanding problem with TTM and resource allocation as well as Bas's new performance problem with RADV. Please review and comment. Thanks, Christian.

[PATCH 2/9] drm/ttm: move the LRU into resource handling v3

2022-02-09 Thread Christian König
This way we finally fix the problem that new resource are not immediately evict-able after allocation. That has caused numerous problems including OOM on GDS handling and not being able to use TTM as general resource manager. v2: stop assuming in ttm_resource_fini that res->bo is still valid. v3:

[PATCH 1/9] drm/ttm: add common accounting to the resource mgr v3

2022-02-09 Thread Christian König
It makes sense to have this in the common manager for debugging and accounting of how much resources are used. v2: cleanup kerneldoc a bit v3: drop the atomic, update counter under lock instead Signed-off-by: Christian König Reviewed-by: Huang Rui (v1) Tested-by: Bas Nieuwenhuizen --- drivers

[PATCH 3/9] drm/ttm: add resource iterator v2

2022-02-09 Thread Christian König
Instead of duplicating that at different places add an iterator over all the resources in a resource manager. v2: add lockdep annotation and kerneldoc Signed-off-by: Christian König Tested-by: Bas Nieuwenhuizen Reviewed-by: Daniel Vetter --- drivers/gpu/drm/ttm/ttm_bo.c | 41 ++-

[PATCH 5/9] drm/amdgpu: remove GTT accounting

2022-02-09 Thread Christian König
This is provided by TTM now. Also switch man->size to bytes instead of pages and fix the double printing of size and usage in debugfs. Signed-off-by: Christian König Tested-by: Bas Nieuwenhuizen --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 49 + drivers/gpu/drm/amd/amd

[PATCH 7/9] drm/amdgpu: drop amdgpu_gtt_node v2

2022-02-09 Thread Christian König
We have the BO pointer in the base structure now as well. v2: add lockdep and kerneldoc Signed-off-by: Christian König Reviewed-by: Daniel Vetter Tested-by: Bas Nieuwenhuizen --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 49 - include/drm/ttm/ttm_resource.h

[PATCH 6/9] drm/amdgpu: remove VRAM accounting

2022-02-09 Thread Christian König
This is provided by TTM now. Also switch man->size to bytes instead of pages and fix the double printing of size and usage in debugfs. Signed-off-by: Christian König Tested-by: Bas Nieuwenhuizen --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

[PATCH 4/9] drm/radeon: remove resource accounting

2022-02-09 Thread Christian König
Use the one provided by TTM instead. Signed-off-by: Christian König Tested-by: Bas Nieuwenhuizen --- drivers/gpu/drm/radeon/radeon.h| 2 -- drivers/gpu/drm/radeon/radeon_kms.c| 7 -- drivers/gpu/drm/radeon/radeon_object.c | 30 +++--- drivers/gpu/drm/radeon

[PATCH 9/9] drm/ttm: rework bulk move handling v2

2022-02-09 Thread Christian König
Instead of providing the bulk move structure for each LRU update set this as property of the BO. This should avoid costly bulk move rebuilds with some games under RADV. v2: some name polishing, add a few more kerneldoc words. v3: add some lockdep Signed-off-by: Christian König Tested-by: Bas Nie

[PATCH 8/9] drm/ttm: allow bulk moves for all domains

2022-02-09 Thread Christian König
Not just TT and VRAM. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Tested-by: Bas Nieuwenhuizen --- drivers/gpu/drm/ttm/ttm_resource.c | 52 +- include/drm/ttm/ttm_device.h | 2 -- include/drm/ttm/ttm_resource.h | 4 +-- 3 files changed, 17

Re: [PATCH 2/2] drm/amdgpu: add sysfs files for XGMI segment size and physical node id

2022-02-09 Thread Christian König
Can anybody give me a Tested-by for this set? I would really like to push it, but it would be nice to have at least somebody with access to an xgmi system tries it first. Christian. Am 26.01.22 um 13:57 schrieb StDenis, Tom: [AMD Official Use Only] Sadly I don't control any XGMI hosts to tr

RE: [PATCH 03/11] drm/amdgpu: Optimize amdgpu_hdp_ras_late_init/amdgpu_hdp_ras_fini function code

2022-02-09 Thread Zhou1, Tao
[AMD Official Use Only] > -Original Message- > From: Chai, Thomas > Sent: Wednesday, February 9, 2022 1:57 PM > To: amd-gfx@lists.freedesktop.org > Cc: Chai, Thomas ; Zhang, Hawking > ; Zhou1, Tao ; Clements, > John ; Chai, Thomas > Subject: [PATCH 03/11] drm/amdgpu: Optimize > amdgpu_

RE: [PATCH 01/11] drm/amdgpu: Optimize xxx_ras_late_init/xxx_ras_late_fini for each ras block

2022-02-09 Thread Zhou1, Tao
[AMD Official Use Only] > -Original Message- > From: Chai, Thomas > Sent: Wednesday, February 9, 2022 1:57 PM > To: amd-gfx@lists.freedesktop.org > Cc: Chai, Thomas ; Zhang, Hawking > ; Zhou1, Tao ; Clements, > John ; Chai, Thomas > Subject: [PATCH 01/11] drm/amdgpu: Optimize > xxx_ras

[PATCH -next] drm/amdkfd: Fix NULL but dereferenced coccicheck error

2022-02-09 Thread Yang Li
Eliminate the following coccicheck warning: ./drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2087:27-38: ERROR: bo_buckets is NULL but dereferenced. Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-)

Re: [PATCH 6/9] drm/amdgpu: remove VRAM accounting

2022-02-09 Thread Matthew Auld
On Wed, 9 Feb 2022 at 08:41, Christian König wrote: > > This is provided by TTM now. > > Also switch man->size to bytes instead of pages and fix the double > printing of size and usage in debugfs. > > Signed-off-by: Christian König > Tested-by: Bas Nieuwenhuizen > --- > drivers/gpu/drm/amd/amdg

Re: [PATCH 2/9] drm/ttm: move the LRU into resource handling v3

2022-02-09 Thread Matthew Auld
On Wed, 9 Feb 2022 at 08:41, Christian König wrote: > > This way we finally fix the problem that new resource are > not immediately evict-able after allocation. > > That has caused numerous problems including OOM on GDS handling > and not being able to use TTM as general resource manager. > > v2:

Re: [PATCH 2/9] drm/ttm: move the LRU into resource handling v3

2022-02-09 Thread Christian König
Am 09.02.22 um 11:09 schrieb Matthew Auld: On Wed, 9 Feb 2022 at 08:41, Christian König wrote: This way we finally fix the problem that new resource are not immediately evict-able after allocation. That has caused numerous problems including OOM on GDS handling and not being able to use TTM

Re: [PATCH 6/9] drm/amdgpu: remove VRAM accounting

2022-02-09 Thread Christian König
Am 09.02.22 um 10:53 schrieb Matthew Auld: On Wed, 9 Feb 2022 at 08:41, Christian König wrote: This is provided by TTM now. Also switch man->size to bytes instead of pages and fix the double printing of size and usage in debugfs. Signed-off-by: Christian König Tested-by: Bas Nieuwenhuizen

Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-09 Thread Jason Gunthorpe
On Wed, Feb 09, 2022 at 07:23:45AM +0100, Christoph Hellwig wrote: > On Tue, Feb 08, 2022 at 07:30:11PM -0800, Dan Williams wrote: > > Interesting. I had expected that to really fix the refcount problem > > that fs/dax.c would need to start taking real page references as pages > > were added to a m

Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-09 Thread Christoph Hellwig
On Wed, Feb 09, 2022 at 08:29:56AM -0400, Jason Gunthorpe wrote: > It is nice, but the other series are still impacted by the fsdax mess > - they still stuff pages into ptes without proper refcounts and have > to carry nonsense to dance around this problem. > > I certainly would be unhappy if the

Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-09 Thread Jason Gunthorpe
On Wed, Feb 09, 2022 at 02:53:51PM +0100, Christoph Hellwig wrote: > On Wed, Feb 09, 2022 at 08:29:56AM -0400, Jason Gunthorpe wrote: > > It is nice, but the other series are still impacted by the fsdax mess > > - they still stuff pages into ptes without proper refcounts and have > > to carry nonse

[PATCH] drm/amdgpu: fix gmc init fail in sriov mode

2022-02-09 Thread Yang Wang
"adev->gfx.rlc.rlcg_reg_access_supported = true;" the above varible were set too late during driver initialization. it will cause the driver to fail to write/read register successfully during GMC hw init in sriov mode. move gfx_xxx_init_rlcg_reg_access_ctrl() function to gfx early init stage to av

RE: [PATCH] drm/amdgpu: fix gmc init fail in sriov mode

2022-02-09 Thread Zhang, Hawking
[AMD Official Use Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Wang, Yang(Kevin) Sent: Wednesday, February 9, 2022 22:30 To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Min, Frank ; Wang, Yang(Kevin) Subject: [PATCH] drm/amdgpu: fix gmc init fail

[PATCH] drm/amdkfd: fix freeing an unset pointer

2022-02-09 Thread trix
From: Tom Rix clang static analysis reports this problem kfd_chardev.c:2092:2: warning: 1st function call argument is an uninitialized value kvfree(bo_privs); ^~~~ When bo_buckets alloc fails, it jumps to an error handler that frees the yet to be allocated bo_privs.

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-02-09 Thread Andrey Grodzovsky
Thanks a lot! Andrey On 2022-02-09 01:06, JingWen Chen wrote: Hi Andrey, I have been testing your patch and it seems fine till now. Best Regards, Jingwen Chen On 2022/2/3 上午2:57, Andrey Grodzovsky wrote: Just another ping, with Shyun's help I was able to do some smoke testing on XGMI SRIO

[PATCH] drm/amdkfd: map sdma queues onto extended engines for navi2x

2022-02-09 Thread Jonathan Kim
The hardware scheduler requires that all SDMA 5.2.x queues are put on the RUN_LIST through the extended engines. Make extended engine unmap available as well. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_mana

[PATCH 0/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Luben Tuikov
Show IP discovery in sysfs. See the commit message for the layout format. For instance, on a Sienna Cichlid, the layout looks like this: $tree /sys/class/drm/card0/device/ip_discovery/ /sys/class/drm/card0/device/ip_discovery/ └── die └── 0 ├── 1 │   └── 0 │   ├──

[PATCH 1/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Luben Tuikov
Add IP discovery data in sysfs. The format is: /sys/class/drm/cardX/device/ip_discovery/die/D/B/I/ where, X is the card ID, an integer, D is the die ID, an integer, B is the IP HW ID, an integer, aka block type, I is the IP HW ID instance, an integer. are the attributes of the block instance. At t

Re: [PATCH 6/8] mm: don't include in

2022-02-09 Thread Christoph Hellwig
On Mon, Feb 07, 2022 at 04:19:29PM -0500, Felix Kuehling wrote: > > Am 2022-02-07 um 01:32 schrieb Christoph Hellwig: >> Move the check for the actual pgmap types that need the free at refcount >> one behavior into the out of line helper, and thus avoid the need to >> pull memremap.h into mm.h. >>

Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-09 Thread Andrey Grodzovsky
All comments are fixed and code pushed. Thanks for everyone who helped reviewing. Andrey On 2022-02-09 02:53, Christian König wrote: Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single

[PATCH] drm/amdkfd: CRIU fix a NULL vs IS_ERR() check

2022-02-09 Thread Dan Carpenter
The kfd_process_device_data_by_id() does not return error pointers, it returns NULL. Fixes: bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping") Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/d

[PATCH] drm/amdkfd: CRIU return -EFAULT for copy_to_user() failure

2022-02-09 Thread Dan Carpenter
If copy_to_user() fails, it returns the number of bytes remaining to be copied but we want to return a negative error code (-EFAULT) to the user. Fixes: 9d5dabfeff3c ("drm/amdkfd: CRIU Save Shared Virtual Memory ranges") Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 +

[PATCH AUTOSEL 5.16 30/42] drm/amd: Warn users about potential s0ix problems

2022-02-09 Thread Sasha Levin
From: Mario Limonciello [ Upstream commit a6ed2035878e5ad2e43ed175d8812ac9399d6c40 ] On some OEM setups users can configure the BIOS for S3 or S2idle. When configured to S3 users can still choose 's2idle' in the kernel by using `/sys/power/mem_sleep`. Before commit 6dc8265f9803 ("drm/amdgpu: al

[PATCH AUTOSEL 5.16 36/42] drm/amd: add support to check whether the system is set to s3

2022-02-09 Thread Sasha Levin
From: Mario Limonciello [ Upstream commit f52a2b8badbd24faf73a13c9c07fdb9d07352944 ] This will be used to help make decisions on what to do in misconfigured systems. v2: squash in semicolon fix from Stephen Rothwell Signed-off-by: Mario Limonciello Reviewed-by: Alex Deucher Signed-off-by: Al

[PATCH AUTOSEL 5.16 37/42] drm/amd: Only run s3 or s0ix if system is configured properly

2022-02-09 Thread Sasha Levin
From: Mario Limonciello [ Upstream commit 04ef860469fda6a646dc841190d05b31fae68e8c ] This will cause misconfigured systems to not run the GPU suspend routines. * In APUs that are properly configured system will go into s2idle. * In APUs that are intended to be S3 but user selects s2idle the G

[PATCH AUTOSEL 5.16 38/42] drm/amdgpu: fix logic inversion in check

2022-02-09 Thread Sasha Levin
From: Christian König [ Upstream commit e8ae38720e1a685fd98cfa5ae118c9d07b45ca79 ] We probably never trigger this, but the logic inside the check is inverted. Signed-off-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/d

[PATCH AUTOSEL 5.15 25/36] drm/amd: Warn users about potential s0ix problems

2022-02-09 Thread Sasha Levin
From: Mario Limonciello [ Upstream commit a6ed2035878e5ad2e43ed175d8812ac9399d6c40 ] On some OEM setups users can configure the BIOS for S3 or S2idle. When configured to S3 users can still choose 's2idle' in the kernel by using `/sys/power/mem_sleep`. Before commit 6dc8265f9803 ("drm/amdgpu: al

[PATCH AUTOSEL 5.15 30/36] drm/amd: add support to check whether the system is set to s3

2022-02-09 Thread Sasha Levin
From: Mario Limonciello [ Upstream commit f52a2b8badbd24faf73a13c9c07fdb9d07352944 ] This will be used to help make decisions on what to do in misconfigured systems. v2: squash in semicolon fix from Stephen Rothwell Signed-off-by: Mario Limonciello Reviewed-by: Alex Deucher Signed-off-by: Al

[PATCH AUTOSEL 5.15 31/36] drm/amd: Only run s3 or s0ix if system is configured properly

2022-02-09 Thread Sasha Levin
From: Mario Limonciello [ Upstream commit 04ef860469fda6a646dc841190d05b31fae68e8c ] This will cause misconfigured systems to not run the GPU suspend routines. * In APUs that are properly configured system will go into s2idle. * In APUs that are intended to be S3 but user selects s2idle the G

[PATCH AUTOSEL 5.15 32/36] drm/amdgpu: fix logic inversion in check

2022-02-09 Thread Sasha Levin
From: Christian König [ Upstream commit e8ae38720e1a685fd98cfa5ae118c9d07b45ca79 ] We probably never trigger this, but the logic inside the check is inverted. Signed-off-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/d

[PATCH AUTOSEL 5.10 23/27] drm/amdgpu: fix logic inversion in check

2022-02-09 Thread Sasha Levin
From: Christian König [ Upstream commit e8ae38720e1a685fd98cfa5ae118c9d07b45ca79 ] We probably never trigger this, but the logic inside the check is inverted. Signed-off-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/d

[PATCH AUTOSEL 5.4 14/15] drm/amdgpu: fix logic inversion in check

2022-02-09 Thread Sasha Levin
From: Christian König [ Upstream commit e8ae38720e1a685fd98cfa5ae118c9d07b45ca79 ] We probably never trigger this, but the logic inside the check is inverted. Signed-off-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/d

Re: [PATCH 1/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Alex Deucher
On Wed, Feb 9, 2022 at 11:30 AM Luben Tuikov wrote: > > Add IP discovery data in sysfs. The format is: > /sys/class/drm/cardX/device/ip_discovery/die/D/B/I/ > where, > X is the card ID, an integer, > D is the die ID, an integer, > B is the IP HW ID, an integer, aka block type, > I is the IP HW ID

[PATCH] drm/amdgpu/sdma5.2: Adjust the name string for firmware

2022-02-09 Thread Alex Deucher
This will make it easier to add new firmwares in the future. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c

Re: [PATCH 1/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Luben Tuikov
On 2022-02-09 13:54, Alex Deucher wrote: > On Wed, Feb 9, 2022 at 11:30 AM Luben Tuikov wrote: >> >> Add IP discovery data in sysfs. The format is: >> /sys/class/drm/cardX/device/ip_discovery/die/D/B/I/ >> where, >> X is the card ID, an integer, >> D is the die ID, an integer, >> B is the IP HW

Re: [PATCH 1/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Luben Tuikov
On 2022-02-09 14:21, Luben Tuikov wrote: > > > On 2022-02-09 13:54, Alex Deucher wrote: >> On Wed, Feb 9, 2022 at 11:30 AM Luben Tuikov wrote: >>> >>> Add IP discovery data in sysfs. The format is: >>> /sys/class/drm/cardX/device/ip_discovery/die/D/B/I/ >>> where, >>> X is the card ID, an int

Re: [PATCH] drm/amdkfd: CRIU return -EFAULT for copy_to_user() failure

2022-02-09 Thread Felix Kuehling
On 2022-02-09 13:09, Dan Carpenter wrote: If copy_to_user() fails, it returns the number of bytes remaining to be copied but we want to return a negative error code (-EFAULT) to the user. Fixes: 9d5dabfeff3c ("drm/amdkfd: CRIU Save Shared Virtual Memory ranges") Signed-off-by: Dan Carpenter

Re: [PATCH] drm/amdkfd: fix freeing an unset pointer

2022-02-09 Thread Felix Kuehling
On 2022-02-09 09:52, t...@redhat.com wrote: From: Tom Rix clang static analysis reports this problem kfd_chardev.c:2092:2: warning: 1st function call argument is an uninitialized value kvfree(bo_privs); ^~~~ When bo_buckets alloc fails, it jumps to an error h

Re: [PATCH -next] drm/amdkfd: Fix NULL but dereferenced coccicheck error

2022-02-09 Thread Felix Kuehling
On 2022-02-08 20:39, Yang Li wrote: Eliminate the following coccicheck warning: ./drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2087:27-38: ERROR: bo_buckets is NULL but dereferenced. Reported-by: Abaci Robot Signed-off-by: Yang Li Thank you. I already picket up Tom Rix's patch for the same iss

Re: [PATCH] drm/amdkfd: map sdma queues onto extended engines for navi2x

2022-02-09 Thread Felix Kuehling
On 2022-02-09 11:11, Jonathan Kim wrote: The hardware scheduler requires that all SDMA 5.2.x queues are put on the RUN_LIST through the extended engines. Make extended engine unmap available as well. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2

Re: [PATCH 5/9] drm/amdgpu: remove GTT accounting

2022-02-09 Thread Felix Kuehling
On 2022-02-09 03:40, Christian König wrote: This is provided by TTM now. Also switch man->size to bytes instead of pages and fix the double printing of size and usage in debugfs. Signed-off-by: Christian König Tested-by: Bas Nieuwenhuizen --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c |

[PATCH v1 1/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Luben Tuikov
Add IP discovery data in sysfs. The format is: /sys/class/drm/cardX/device/ip_discovery/die/D/B/I/ where, X is the card ID, an integer, D is the die ID, an integer, B is the IP HW ID, an integer, aka block type, I is the IP HW ID instance, an integer. are the attributes of the block instance. At t

[PATCH v1 0/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Luben Tuikov
Version 1, this version, adds sysfs tear-down on rmmod. Show IP discovery in sysfs. See the commit message for the layout format. For instance, on a Sienna Cichlid, the layout looks like this: $tree /sys/class/drm/card0/device/ip_discovery/ /sys/class/drm/card0/device/ip_discovery/ └── die └

RE: [PATCH] drm/amdkfd: map sdma queues onto extended engines for navi2x

2022-02-09 Thread Kim, Jonathan
[AMD Official Use Only] > -Original Message- > From: Kuehling, Felix > Sent: February 9, 2022 4:26 PM > To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdkfd: map sdma queues onto extended engines for > navi2x > > > On 2022-02-09 11:11, Jonathan Kim wrote: >

Re: [PATCH] drm/amdkfd: map sdma queues onto extended engines for navi2x

2022-02-09 Thread Felix Kuehling
On 2022-02-09 19:18, Kim, Jonathan wrote: [AMD Official Use Only] -Original Message- From: Kuehling, Felix Sent: February 9, 2022 4:26 PM To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdkfd: map sdma queues onto extended engines for navi2x On 2022-02-09

RE: [PATCH] drm/amdgpu: fix gmc init fail in sriov mode

2022-02-09 Thread Min, Frank
[AMD Official Use Only] Hi Kevin, This patch looks good to me Reviewed by: Frank Min Best Regards, Frank -Original Message- From: Wang, Yang(Kevin) Sent: Wednesday, February 9, 2022 10:30 PM To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Min, Frank ; Wang, Yang(Kevin) Subjec

[PATCH] drm/amd/pm: fix enabled features retrieving on Renoir and Cyan Skillfish

2022-02-09 Thread Evan Quan
For Cyan Skillfish and Renoir, there is no interface provided by PMFW to retrieve the enabled features. So, we assume all features are enabled. Fixes: 7ade3ca9cdb5 ("drm/amd/pm: correct the usage for 'supported' member of smu_feature structure") Signed-off-by: Evan Quan Change-Id: I1231f146405a

Re: [PATCH 6/8] mm: don't include in

2022-02-09 Thread Alistair Popple
On Thursday, 10 February 2022 4:48:36 AM AEDT Christoph Hellwig wrote: > On Mon, Feb 07, 2022 at 04:19:29PM -0500, Felix Kuehling wrote: > > > > Am 2022-02-07 um 01:32 schrieb Christoph Hellwig: > >> Move the check for the actual pgmap types that need the free at refcount > >> one behavior into the

Re: [PATCH] drm/amd/pm: fix enabled features retrieving on Renoir and Cyan Skillfish

2022-02-09 Thread Nathan Chancellor
On Thu, Feb 10, 2022 at 09:47:00AM +0800, Evan Quan wrote: > For Cyan Skillfish and Renoir, there is no interface provided by PMFW > to retrieve the enabled features. So, we assume all features are enabled. > > Fixes: 7ade3ca9cdb5 ("drm/amd/pm: correct the usage for 'supported' member of > smu_fe

RE: [PATCH] drm/amdgpu/sdma5.2: Adjust the name string for firmware

2022-02-09 Thread Chen, Guchun
[Public] Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, February 10, 2022 3:00 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH] drm/amdgpu/sdma5.2: Adjust the name string for firmware Thi

RE: [PATCH 03/11] drm/amdgpu: Optimize amdgpu_hdp_ras_late_init/amdgpu_hdp_ras_fini function code

2022-02-09 Thread Chai, Thomas
[AMD Official Use Only] -Original Message- From: Zhou1, Tao Sent: Wednesday, February 9, 2022 4:54 PM To: Chai, Thomas ; amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Clements, John Subject: RE: [PATCH 03/11] drm/amdgpu: Optimize amdgpu_hdp_ras_late_init/amdgpu_hdp_ras_fini fun

[PATCH] drm/amdkfd: replace err by dbg print at svm vram migration

2022-02-09 Thread Alex Sierra
Avoid spam the kernel log on application memory allocation failures. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c in

RE: [PATCH] drm/amdkfd: replace err by dbg print at svm vram migration

2022-02-09 Thread Chen, Guchun
[Public] How about using 'dev_dbg'? It will benefit multiple GPU configuration when enabling debug option. Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Sierra Sent: Thursday, February 10, 2022 10:59 AM To: amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix Subject

[PATCH] drm/amdgpu: Fix compile error.

2022-02-09 Thread Andrey Grodzovsky
Seems I forgot to add this to the relevant commit when submitting. Signed-off-by: Andrey Grodzovsky Reported-by: kernel test robot --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/dri

Re: [PATCH] drm/amdgpu: Fix compile error.

2022-02-09 Thread Alex Deucher
On Wed, Feb 9, 2022 at 10:17 PM Andrey Grodzovsky wrote: > > Seems I forgot to add this to the relevant commit > when submitting. > > Signed-off-by: Andrey Grodzovsky > Reported-by: kernel test robot Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +-- > 1 file c

Re: [PATCH v1 1/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Wang, Yang(Kevin)
[AMD Official Use Only] From: amd-gfx on behalf of Luben Tuikov Sent: Thursday, February 10, 2022 6:51 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; StDenis, Tom ; Tuikov, Luben Subject: [PATCH v1 1/1] drm/amdgpu: Show IP discovery in sysfs

Re: [PATCH] drm/amd/pm: fix enabled features retrieving on Renoir and Cyan Skillfish

2022-02-09 Thread Alex Deucher
On Wed, Feb 9, 2022 at 8:47 PM Evan Quan wrote: > > For Cyan Skillfish and Renoir, there is no interface provided by PMFW > to retrieve the enabled features. So, we assume all features are enabled. > > Fixes: 7ade3ca9cdb5 ("drm/amd/pm: correct the usage for 'supported' member of > smu_feature str

[pull] amdgpu drm-fixes-5.17

2022-02-09 Thread Alex Deucher
Hi Dave, Daniel, Fixes for 5.17. The following changes since commit dfd42facf1e4ada021b939b4e19c935dcdd55566: Linux 5.17-rc3 (2022-02-06 12:20:50 -0800) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-fixes-5.17-2022-02-09 for you to fe

Re: [PATCH v1 1/1] drm/amdgpu: Show IP discovery in sysfs

2022-02-09 Thread Lang Yu
On 02/09/ , Luben Tuikov wrote: > Add IP discovery data in sysfs. The format is: > /sys/class/drm/cardX/device/ip_discovery/die/D/B/I/ > where, > X is the card ID, an integer, > D is the die ID, an integer, > B is the IP HW ID, an integer, aka block type, > I is the IP HW ID instance, an integer. >

[PATCH] drm/amdgpu: disable xgmi feature support in sriov mode

2022-02-09 Thread Yang Wang
the xgmi feature is not supported in sriov mode. Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index

Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-09 Thread Somalapuram, Amaranath
On 2/9/2022 1:17 PM, Christian König wrote: Am 08.02.22 um 16:28 schrieb Alex Deucher: On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath wrote: Dump the list of register values to trace event on GPU reset. Signed-off-by: Somalapuram Amaranath ---   drivers/gpu/drm/amd/amdgpu/amdgpu_devi

[PATCH] drm/amdgpu: add support for GC 10.1.4

2022-02-09 Thread Lang Yu
Add basic support for GC 10.1.4, it uses same IP blocks with GC 10.1.3 Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 ++- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 9 + drivers/gpu/drm/amd/amdg

RE: [PATCH 03/11] drm/amdgpu: Optimize amdgpu_hdp_ras_late_init/amdgpu_hdp_ras_fini function code

2022-02-09 Thread Zhou1, Tao
[AMD Official Use Only] OK, if there is further refinement, the series is: Reviewed-by: Tao Zhou > -Original Message- > From: Chai, Thomas > Sent: Thursday, February 10, 2022 10:59 AM > To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Clements, John > > Subject: R

[PATCH] drm/amdgpu: no rlcg read access in SRIOV case for gfx v9

2022-02-09 Thread Guchun Chen
Fall back to MMIO to read registers as rlcg read is not available for gfx v9 in SRIOV configration. Otherwise, gmc_v9_0_flush_gpu_tlb will always complain timeout and finally breaks driver load. Fixes: 0dc4a7e75581("drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9") Signed-off-by: Guchun Ch

Re: [PATCH 6/8] mm: don't include in

2022-02-09 Thread Christoph Hellwig
On Thu, Feb 10, 2022 at 01:10:47PM +1100, Alistair Popple wrote: > diff --git a/mm/gup.c b/mm/gup.c > index cbb49abb7992..8e85c9fb8df4 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2007,7 +2007,6 @@ static long check_and_migrate_movable_pages(unsigned > long nr_pages, > if (!ret && list_emp

Re: [PATCH] drm/amdgpu: Fix compile error.

2022-02-09 Thread Christian König
Am 10.02.22 um 04:17 schrieb Andrey Grodzovsky: Seems I forgot to add this to the relevant commit when submitting. Rebase/merge issue? Looks like it. Signed-off-by: Andrey Grodzovsky Reported-by: kernel test robot Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_re

Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset

2022-02-09 Thread Christian König
Am 10.02.22 um 06:29 schrieb Somalapuram, Amaranath: On 2/9/2022 1:17 PM, Christian König wrote: Am 08.02.22 um 16:28 schrieb Alex Deucher: On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath wrote: Dump the list of register values to trace event on GPU reset. Signed-off-by: Somalapuram Am

[PATCH 01/27] mm: remove a pointless CONFIG_ZONE_DEVICE check in memremap_pages

2022-02-09 Thread Christoph Hellwig
memremap.c is only built when CONFIG_ZONE_DEVICE is set, so remove the superflous extra check. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni Reviewed-by: Muchun Song Reviewed-by: Dan Williams --- mm/memremap.c | 3

start sorting out the ZONE_DEVICE refcount mess v2

2022-02-09 Thread Christoph Hellwig
Hi all, this series removes the offset by one refcount for ZONE_DEVICE pages that are freed back to the driver owning them, which is just device private ones for now, but also the planned device coherent pages and the ehanced p2p ones pending. It does not address the fsdax pages yet, which will b

[PATCH 02/27] mm: remove the __KERNEL__ guard from

2022-02-09 Thread Christoph Hellwig
__KERNEL__ ifdefs don't make sense outside of include/uapi/. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni Reviewed-by: Muchun Song Reviewed-by: Dan Williams --- include/linux/mm.h | 4 1 file changed, 4 delet

[PATCH 04/27] mm: move free_devmap_managed_page to memremap.c

2022-02-09 Thread Christoph Hellwig
free_devmap_managed_page has nothing to do with the code in swap.c, move it to live with the rest of the code for devmap handling. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni Reviewed-by: Muchun Song Reviewed-by: D

[PATCH 03/27] mm: remove pointless includes from

2022-02-09 Thread Christoph Hellwig
hmm.h pulls in the world for no good reason at all. Remove the includes and push a few ones into the users instead. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + d

[PATCH 05/27] mm: simplify freeing of devmap managed pages

2022-02-09 Thread Christoph Hellwig
Make put_devmap_managed_page return if it took charge of the page or not and remove the separate page_is_devmap_managed helper. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni Reviewed-by: Dan Williams --- include/lin

[PATCH 06/27] mm: don't include in

2022-02-09 Thread Christoph Hellwig
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Dan Williams Acked-by

[PATCH 07/27] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-09 Thread Christoph Hellwig
ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to b

[PATCH 08/27] fsdax: depend on ZONE_DEVICE || FS_DAX_LIMITED

2022-02-09 Thread Christoph Hellwig
Add a depends on ZONE_DEVICE support or the s390-specific limited DAX support, as one of the two is required at runtime for fsdax code to actually work. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe --- fs/Kconfig | 1 + 1 file changed, 1 insertion(

[PATCH 09/27] mm: generalize the pgmap based page_free infrastructure

2022-02-09 Thread Christoph Hellwig
Key off on the existence of ->page_free to prepare for adding support for more pgmap types that are device managed and thus need the free callback. Signed-off-by: Christoph Hellwig --- mm/memremap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memremap.c b/mm/memrem

[PATCH 11/27] mm: refactor the ZONE_DEVICE handling in migrate_vma_insert_page

2022-02-09 Thread Christoph Hellwig
Make the flow a little more clear and prepare for adding a new ZONE_DEVICE memory type. Signed-off-by: Christoph Hellwig --- mm/migrate.c | 31 +++ 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 8e0370a73f8a43..30ecd7

[PATCH 10/27] mm: refactor check_and_migrate_movable_pages

2022-02-09 Thread Christoph Hellwig
Remove up to two levels of indentation by using continue statements and move variables to local scope where possible. Signed-off-by: Christoph Hellwig --- mm/gup.c | 81 ++-- 1 file changed, 44 insertions(+), 37 deletions(-) diff --git a/mm/gu

[PATCH 12/27] mm: refactor the ZONE_DEVICE handling in migrate_vma_pages

2022-02-09 Thread Christoph Hellwig
Make the flow a little more clear and prepare for adding a new ZONE_DEVICE memory type. Signed-off-by: Christoph Hellwig --- mm/migrate.c | 27 --- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 30ecd7223656c1..746e123088

[PATCH 14/27] mm: build migrate_vma_* for all configs with ZONE_DEVICE support

2022-02-09 Thread Christoph Hellwig
This code will be used for device coherent memory as well in a bit, so relax the ifdef a bit. Signed-off-by: Christoph Hellwig --- mm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/Kconfig b/mm/Kconfig index 6391d8d3a616f3..95d4aa3acaefe0 100644 --- a/mm/Kconfig +

[PATCH 15/27] mm: add zone device coherent type memory support

2022-02-09 Thread Christoph Hellwig
From: Alex Sierra Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always

[PATCH 13/27] mm: move the migrate_vma_* device migration code into it's own file

2022-02-09 Thread Christoph Hellwig
Split the code used to migrate to and from ZONE_DEVICE memory from migrate.c into a new file. Signed-off-by: Christoph Hellwig --- mm/Kconfig | 3 + mm/Makefile | 1 + mm/migrate.c| 753 --- mm/migrate_device.c | 765 ++

[PATCH 16/27] mm: add device coherent vma selection for memory migration

2022-02-09 Thread Christoph Hellwig
From: Alex Sierra This case is used to migrate pages from device memory, back to system memory. Device coherent type memory is cache coherent from device and CPU point of view. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Poppple Signed-off-by: Christoph Hellwig

[PATCH 17/27] mm/gup: fail get_user_pages for LONGTERM dev coherent type

2022-02-09 Thread Christoph Hellwig
From: Alex Sierra Avoid long term pinning for Coherent device type pages. This could interfere with their own device memory manager. For now, we are just returning error for PIN_LONGTERM Coherent device type pages. Eventually, these type of pages will get migrated to system memory, once the devic

[PATCH 18/27] drm/amdkfd: add SPM support for SVM

2022-02-09 Thread Christoph Hellwig
From: Alex Sierra When CPU is connected throug XGMI, it has coherent access to VRAM resource. In this case that resource is taken from a table in the device gmc aperture base. This resource is used along with the device type, which could be DEVICE_PRIVATE or DEVICE_COHERENT to create the device p

[PATCH 19/27] drm/amdkfd: coherent type as sys mem on migration to ram

2022-02-09 Thread Christoph Hellwig
From: Alex Sierra Coherent device type memory on VRAM to RAM migration, has similar access as System RAM from the CPU. This flag sets the source from the sender. Which in Coherent type case, should be set as MIGRATE_VMA_SELECT_DEVICE_COHERENT. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehl

  1   2   >