Re: [PATCH v2] drm/amd/amdgpu: Fix errors in documentation of function parameters

2021-04-28 Thread Felix Kuehling
On 2021-04-27 7:27, Fabio M. De Francesco wrote: In the documentation of functions, removed excess parameters, described undocumented ones, and fixed syntax errors. Signed-off-by: Fabio M. De Francesco --- Changes from v1: Cc'ed all the maintainers. Looks like Alex already applied V1. So thi

[PATCH] drm/amdgpu: Rename the flags to eliminate ambiguity v2

2021-04-28 Thread Peng Ju Zhou
The flags vf_reg_access_* may cause confusion, rename the flags to make it more clear. Signed-off-by: Peng Ju Zhou --- drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h b/drivers/gpu/

[PATCH] drm/amdgpu: enable gfx ras in aldebran by default

2021-04-28 Thread Hawking Zhang
gfx ras now can be enabled by default in aldebaran Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 984e827..9306

[PATCH] drm/amdgpu: switch to mmhub ras callback for ras fini

2021-04-28 Thread Hawking Zhang
invoke callback function for mmhub ras fini Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index 697ab26..a129ecc 10064

[PATCH 7/7] drm/amdgpu: retired reset_ras_error_count from hdp callbacks

2021-04-28 Thread Hawking Zhang
It was moved to hdp ras callbacks Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h | 1 - drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h index

[PATCH 6/7] drm/amdgpu: enable ras error count query and reset for HDP

2021-04-28 Thread Hawking Zhang
add hdp block ras error query and reset support in amdgpu ras error count query and reset interface Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 ++ drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 drivers/gpu/drm/amd/amdgpu/soc15.c | 3 --- 3 fil

[PATCH 5/7] drm/amdgpu: init/fini hdp v4_0 ras

2021-04-28 Thread Hawking Zhang
invoke hdp v4_0 ras init in gmc late_init phase while ras fini in gmc sw_fini phase Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdg

[PATCH 4/7] drm/amdgpu: initialize hdp v4_0 ras functions

2021-04-28 Thread Hawking Zhang
hdp v4_0 support ras features Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 4da8b3d..8e0cab5 100644 --- a/drivers/gpu/drm/amd/a

[PATCH 3/7] drm/amdgpu: implement hdp v4_0 ras functions

2021-04-28 Thread Hawking Zhang
implement hdp v4_0 ras functions, including ras init/fini, query/reset_error_counter Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c | 30 -- drivers/gpu/drm/amd/amdgpu/hdp_v4_0.h | 1 + 2 files changed, 29 insertions(+), 2 deletions(-) diff -

[PATCH 2/7] drm/amdgpu: add helpers for hdp ras init/fini

2021-04-28 Thread Hawking Zhang
hdp ras init/fini are common functions that can be shared among hdp generations Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.c | 69 + drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h | 2 + 3 file

[PATCH 1/7] drm/amdgpu: add hdp ras structures

2021-04-28 Thread Hawking Zhang
centralize all hdp ras operation to ras_funcs Signed-off-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h index 43caf9f..c89cf8d 100644

[PATCH] drm/amdgpu: Rename the flags to to eliminate ambiguity

2021-04-28 Thread Peng Ju Zhou
The flags vf_reg_access_* may cause confusion, rename the flags to make it more clear. Signed-off-by: Peng Ju Zhou --- drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h b/drivers/gpu/

Re: [PATCH 2/2] drm/amdkfd: flush TLB after updating GPU page table

2021-04-28 Thread Felix Kuehling
Am 2021-04-28 um 9:53 p.m. schrieb Philip Yang: > To workaround the situation that vm retry fault keep coming after page > table update. We are investigating the root cause, but once this issue > happens, application will stuck and sometimes have to reboot to recover. > > Signed-off-by: Philip Yang

Re: [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts

2021-04-28 Thread Felix Kuehling
Am 2021-04-28 um 9:53 p.m. schrieb Philip Yang: > If migration vma setup, but failed before start sdma memory copy, e.g. > process is killed, don't wait for sdma fence done. I think you could describe this more generally as "Handle errors returned by svm_migrate_copy_to_vram/ram". > > Signed-of

[PATCH] drm/amdgpu: seperate the dependency between CGCG and CGLS when diable CGCG/CGLS

2021-04-28 Thread Changfeng.Zhu
From: changzhu From: Changfeng The disable process of CGLS is dependent on CGCG now. Align with windows code, seperate the dependency between CGCG and CGLS and it will reduce confusion when debug CGCG/CGLS related issue. Change-Id: Ia91b8b16236bebd9224160672e500f6850dbc268 Signed-off-by: Chang

[PATCH] drm/amdgpu: add new MC firmware for Polaris12 32bit ASIC

2021-04-28 Thread Evan Quan
Polaris12 32bit ASIC needs a special MC firmware. Change-Id: I1eea9cc1d5c81a370c8fccf139f4f77bac4a1baa Signed-off-by: Evan Quan Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/

RE: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma 5.0

2021-04-28 Thread Zhang, Hawking
[AMD Public Use] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, April 29, 2021 11:41 To: Deucher, Alexander Cc: amd-gfx list Subject: Re: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma 5.0 Ping? On

Re: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma 5.0

2021-04-28 Thread Alex Deucher
Ping? On Tue, Apr 20, 2021 at 3:28 PM Alex Deucher wrote: > > Add emit mem sync callback for sdma_v5_0 > > In amdgpu sync object test, three threads created jobs > to send GFX IB and SDMA IB in sequence. After the first > GFX thread joined, sometimes the third thread will reuse > the same physica

[PATCH] drm/amd/display: Remove duplicate declaration of dc_state

2021-04-28 Thread Wan Jiabing
There are two declarations of struct dc_state here. The later one is closer to its user. Remove the former duplicate. Signed-off-by: Wan Jiabing --- drivers/gpu/drm/amd/display/dc/dc.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd

[PATCH] drm/amd/display: Remove duplicate include of hubp.h

2021-04-28 Thread Wan Jiabing
In commit 482812d56698e ("drm/amd/display: Set max TTU on DPG enable"), "hubp.h" was added which caused the duplicate include. To be on the safe side, remove the later duplicate include. Signed-off-by: Wan Jiabing --- drivers/gpu/drm/amd/display/dc/core/dc.c | 1 - 1 file changed, 1 deletion(-)

RE: [PATCH 2/2] drm/amd/pm: expose pmfw attached timestamp on Aldebaran

2021-04-28 Thread Kasiviswanathan, Harish
[AMD Official Use Only - Internal Distribution Only] Acked-by: Harish Kasiviswanathan -Original Message- From: Quan, Evan Sent: Tuesday, April 27, 2021 9:43 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kasiviswanathan, Harish ; Quan, Evan Subject: [PATCH 2/2] drm/amd

[PATCH 1/2] drm/amdkfd: wait migration done only if migration starts

2021-04-28 Thread Philip Yang
If migration vma setup, but failed before start sdma memory copy, e.g. process is killed, don't wait for sdma fence done. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/

[PATCH 2/2] drm/amdkfd: flush TLB after updating GPU page table

2021-04-28 Thread Philip Yang
To workaround the situation that vm retry fault keep coming after page table update. We are investigating the root cause, but once this issue happens, application will stuck and sometimes have to reboot to recover. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++ 1 fi

[PATCH v8 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-04-28 Thread Nikola Cornij
[why] DP 1.4a spec madates that if DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is set, Extended Base Receiver Capability DPCD space must be used. Without doing that, the three DPCD values that differ will be wrong, leading to incorrect or limited functionality. MST link rate, for example, could have a l

[PATCH v8 0/1] drm/drm_mst: Use Extended Base Receiver Capability

2021-04-28 Thread Nikola Cornij
Change history: v8: - Chaged link lanes and rate parameters to u8 v7: - Fixed formatting - Fixed 'unused variable' compile warning - Fixed comment format v6: - Submited from (hopefully) the correct repo to fix build error v5: - Fixed min_t() macro arguments v4: - Fixed drm/radeon/ lan

[PATCH v7 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-04-28 Thread Nikola Cornij
[why] DP 1.4a spec madates that if DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is set, Extended Base Receiver Capability DPCD space must be used. Without doing that, the three DPCD values that differ will be wrong, leading to incorrect or limited functionality. MST link rate, for example, could have a l

[PATCH v7 0/1] drm/drm_mst: Use Extended Base Receiver Capability

2021-04-28 Thread Nikola Cornij
Change history: v7: - Fixed formatting in drm_dp_mst_topology.c - Fixed 'unused variable' compile warning - Fixed comment format v6: - Submited from (hopefully) the correct repo to fix build error v5: - Fixed min_t() macro arguments v4: - Fixed drm/radeon/ lane count and rate v3: - Fixe

Re: [PATCH v6 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-04-28 Thread Lyude Paul
Resend, since I hit enter too early on the first one :). On Wed, 2021-04-28 at 16:44 -0400, Nikola Cornij wrote: > [why] > DP 1.4a spec madates that if DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is > set, Extended Base Receiver Capability DPCD space must be used. Without > doing that, the three DPCD v

Re: [PATCH v6 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-04-28 Thread Lyude Paul
On Wed, 2021-04-28 at 16:44 -0400, Nikola Cornij wrote: > [why] > DP 1.4a spec madates that if DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is > set, Extended Base Receiver Capability DPCD space must be used. Without > doing that, the three DPCD values that differ will be wrong, leading to > incorrect or

Re: 16 bpc fixed point (RGBA16) framebuffer support for core and AMD.

2021-04-28 Thread Alex Deucher
On Tue, Apr 20, 2021 at 5:25 PM Alex Deucher wrote: > > On Fri, Apr 16, 2021 at 12:29 PM Mario Kleiner > wrote: > > > > Friendly ping to the AMD people. Nicholas, Harry, Alex, any feedback? > > Would be great to get this in sooner than later. > > > > No objections from me. > I don't have any obj

Re: [PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-28 Thread StDenis, Tom
[AMD Official Use Only - Internal Distribution Only] Done. From: Alex Deucher Sent: Wednesday, April 28, 2021 16:53 To: StDenis, Tom Cc: Gu, JiaWei (Will); Christian König; Nieto, David M; amd-gfx@lists.freedesktop.org; Deucher, Alexander Subject: Re: [P

Re: [PATCH] amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create

2021-04-28 Thread Alex Deucher
Applied. Thanks! Alex On Wed, Apr 21, 2021 at 6:29 AM Christian König wrote: > > Am 21.04.21 um 11:16 schrieb Simon Ser: > > This error code-path is missing a drm_gem_object_put call. Other > > error code-paths are fine. > > Good catch. For some extra points you could change the error handling

Re: [PATCH] drm/amd/display: Fix build warnings

2021-04-28 Thread Alex Deucher
On Wed, Apr 21, 2021 at 12:18 PM Guenter Roeck wrote: > > Fix the following build warnings. > > drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: > In function ‘dm_update_mst_vcpi_slots_for_dsc’: > drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6242:46: > wa

Re: [PATCH 1/2] drm/amdgpu/display: fix dal_allocation documentation

2021-04-28 Thread Alex Deucher
Ping? On Fri, Apr 23, 2021 at 4:49 PM Alex Deucher wrote: > > Add missing structure elements. > > Fixes: 1ace37b873c2 ("drm/amdgpu/display: Implement functions to let DC > allocate GPU memory") > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 4 >

Re: [PATCH] drm/amd/display: Remove condition which is always set to True

2021-04-28 Thread Alex Deucher
On Fri, Apr 23, 2021 at 4:57 PM Souptick Joarder wrote: > > Kernel test robot throws below warning -> > > >> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_debugfs.c:3015:53: > >> warning: address of 'aconnector->mst_port->mst_mgr' will always > >> evaluate to 'true' [-Wpointer-bool-con

Re: [PATCH v2] drm/amdgpu: Register VGA clients after init can no longer fail

2021-04-28 Thread Alex Deucher
On Mon, Apr 26, 2021 at 6:50 AM Kai-Heng Feng wrote: > > When an amdgpu device fails to init, it makes another VGA device cause > kernel splat: > kernel: amdgpu :08:00.0: amdgpu: amdgpu_device_ip_init failed > kernel: amdgpu :08:00.0: amdgpu: Fatal error during GPU init > kernel: amdgpu: p

Re: [PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-28 Thread Alex Deucher
Please revert the patch in umr until the kernel side lands in upstream drm-next. Alex On Wed, Apr 28, 2021 at 7:30 AM StDenis, Tom wrote: > > [AMD Official Use Only - Internal Distribution Only] > > Hi Will, > > I've merged in your v2 patch last week. If that's still the latest you > should be

Re: [PATCH] Handling of amdgpu_device_resume return value for graceful teardown

2021-04-28 Thread Alex Deucher
Applied. Thanks! Alex On Tue, Apr 27, 2021 at 1:46 PM Alex Deucher wrote: > > On Tue, Apr 27, 2021 at 12:51 AM wrote: > > > > From: Pavan Kumar Ramayanam > > > > The runtime resume PM op disregards the return value from > > amdgpu_device_resume(), masking errors for failed resumes at the PM >

Re: [PATCH v6 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-04-28 Thread Alex Deucher
+ dri-devel as well. On Wed, Apr 28, 2021 at 4:44 PM Nikola Cornij wrote: > > [why] > DP 1.4a spec madates that if DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is > set, Extended Base Receiver Capability DPCD space must be used. Without > doing that, the three DPCD values that differ will be wrong, lea

[PATCH v6 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-04-28 Thread Nikola Cornij
[why] DP 1.4a spec madates that if DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is set, Extended Base Receiver Capability DPCD space must be used. Without doing that, the three DPCD values that differ will be wrong, leading to incorrect or limited functionality. MST link rate, for example, could have a l

[PATCH v6 0/1] drm/drm_mst: Use Extended Base Receiver Capability

2021-04-28 Thread Nikola Cornij
Change history: v6: - Submited from (hopefully) the correct repo to fix build error v5: - Fixed min_t() macro arguments v4: - Fixed drm/radeon/ lane count and rate v3: - Fixed check-patch errors v2: - No changes, this was my mistaken reply to my patch v1: - Initial revision Nikola Corni

VEGA 20 (Aka Vega12)

2021-04-28 Thread Rodrigo Luglio
I have a macbook pro with vega 20 which uses the amdgpu firmware vega12 and when i boot any distro the graphics glitch and the computer freezes. If i install amdgpu pro on ubuntu it works flawlessly. Would you guys help me debug this and fix for upstream? Please, let me know which kind of log

Re: [PATCH v5 09/27] dmr/amdgpu: Move some sysfs attrs creation to default_attr

2021-04-28 Thread Bjorn Helgaas
In subject, s/dmr/drm/ s/Move some/Move/ ("some" consumes space without adding meaning) Or maybe something like: drm/amdgpu: Convert driver sysfs attributes to static attributes On Wed, Apr 28, 2021 at 11:11:49AM -0400, Andrey Grodzovsky wrote: > This allows to remove explicit creation and d

Re: [PATCH v5 00/27] RFC Support hot device unplug in amdgpu

2021-04-28 Thread Bjorn Helgaas
On Wed, Apr 28, 2021 at 11:11:40AM -0400, Andrey Grodzovsky wrote: > Until now extracting a card either by physical extraction (e.g. eGPU with > thunderbolt connection or by emulation through syfs -> > /sys/bus/pci/devices/device_id/remove) > would cause random crashes in user apps. The random

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Felix Kuehling
Am 2021-04-28 um 12:58 p.m. schrieb Christian König: > Am 28.04.21 um 18:49 schrieb Felix Kuehling: >> Am 2021-04-28 um 12:33 p.m. schrieb Christian König: >>> Am 28.04.21 um 17:19 schrieb Felix Kuehling: >>> [SNIP] >> Failing that, I'd probably have to abandon userptr BOs altogether >> and

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Christian König
Am 28.04.21 um 18:49 schrieb Felix Kuehling: Am 2021-04-28 um 12:33 p.m. schrieb Christian König: Am 28.04.21 um 17:19 schrieb Felix Kuehling: [SNIP] Failing that, I'd probably have to abandon userptr BOs altogether and switch system memory mappings over to using the new SVM API on systems wher

Re: [PATCH v5 08/27] PCI: add support for dev_groups to struct pci_device_driver

2021-04-28 Thread Bjorn Helgaas
In subject: s/PCI: add support/PCI: Add support/ to match convention ("git log --oneline drivers/pci/pci-driver.c" to learn this). On Wed, Apr 28, 2021 at 11:11:48AM -0400, Andrey Grodzovsky wrote: > This is exact copy of 'USB: add support for dev_groups to > struct usb_device_driver' patch by G

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Felix Kuehling
Am 2021-04-28 um 12:33 p.m. schrieb Christian König: > Am 28.04.21 um 17:19 schrieb Felix Kuehling: >> Am 2021-04-28 um 5:05 a.m. schrieb Christian König: >> [SNIP] >> Hmm, I was missing something. The amdgpu_gtt_mgr doesn't actually >> allocate space for many BOs: >> >> if (!place->lpfn)

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Christian König
Am 28.04.21 um 17:19 schrieb Felix Kuehling: Am 2021-04-28 um 5:05 a.m. schrieb Christian König: [SNIP] Hmm, I was missing something. The amdgpu_gtt_mgr doesn't actually allocate space for many BOs: if (!place->lpfn) { mem->mm_node = NULL; mem->start =

[PATCH v5 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-04-28 Thread Nikola Cornij
[why] DP 1.4a spec madates that if DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is set, Extended Base Receiver Capability DPCD space must be used. Without doing that, the three DPCD values that differ will be wrong, leading to incorrect or limited functionality. MST link rate, for example, could have a l

[PATCH v5 0/1] drm/drm_mst: Use Extended Base Receiver Capability

2021-04-28 Thread Nikola Cornij
As of patch v4, patch was supposed to be complete, however there was a build problem introduced in v3 when fixing check-patch errors. Change history: v5: - Fixed min_t() macro arguments v4: - Fixed drm/radeon/ lane count and rate v3: - Fixed check-patch errors v2: - No changes, this was my

Re: [PATCH v5 19/27] drm/amdgpu: Finilise device fences on device remove.

2021-04-28 Thread Andrey Grodzovsky
On 2021-04-28 11:11 a.m., Andrey Grodzovsky wrote: Make sure all fecens dependent on HW present are force signaled when handling device removal. This helpes later to scope all HW accesing code such as IOCTLs in drm_dev_enter/exit and use drm_dev_unplug as synchronization point past which we kn

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Felix Kuehling
Am 2021-04-28 um 5:05 a.m. schrieb Christian König: > Am 28.04.21 um 09:49 schrieb Felix Kuehling: >> Am 2021-04-28 um 3:04 a.m. schrieb Christian König: >>> Am 28.04.21 um 07:33 schrieb Felix Kuehling: SG BOs do not occupy space that is managed by TTM. So do not evict them. Thi

[PATCH v5 27/27] drm/amdgpu: Verify DMA opearations from device are done

2021-04-28 Thread Andrey Grodzovsky
In case device remove is just simualted by sysfs then verify device doesn't keep doing DMA to the released memory after pci_remove is done. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/a

[PATCH v5 23/27] drm/amd/powerplay: Scope all PM queued work with drm_dev_enter/exit

2021-04-28 Thread Andrey Grodzovsky
To allow completion and further block of HW accesses post device PCI remove. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 44 +-- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 26 +++--- 2 files changed, 47 insertions(+), 23 deletions

[PATCH v5 22/27] drm/amd/display: Scope all DM queued work with drm_dev_enter/exit

2021-04-28 Thread Andrey Grodzovsky
To allow completion and further block of HW accesses post device PCI remove. Signed-off-by: Andrey Grodzovsky --- .../amd/display/amdgpu_dm/amdgpu_dm_hdcp.c| 124 +++--- .../drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 24 +++- 2 files changed, 98 insertions(+), 50 deletions(-)

[PATCH v5 18/27] drm/sched: Expose drm_sched_entity_kill_jobs

2021-04-28 Thread Andrey Grodzovsky
Will be used to complete all schedulte fences on device remove Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/scheduler/sched_entity.c | 3 ++- include/drm/gpu_scheduler.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_enti

[PATCH v5 17/27] drm/amdgpu: Add rw_sem to pushing job into sched queue

2021-04-28 Thread Andrey Grodzovsky
Will be later used block further submissions once device is removed. Also complete schedule fence if scheduling failed due to submission blocking. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 13 +++

[PATCH v5 16/27] drm/amdgpu: Unmap all MMIO mappings

2021-04-28 Thread Andrey Grodzovsky
Access to those must be prevented post pci_remove Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 5 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 -- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 28 ++-- drivers/gpu/drm/am

[PATCH v5 13/27] drm/amdgpu: When filizing the fence driver. stop scheduler first.

2021-04-28 Thread Andrey Grodzovsky
No point calling amdgpu_fence_wait_empty before stopping the SW scheduler otherwise there is always a chance another job sneaked in after the wait. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff

[PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-04-28 Thread Andrey Grodzovsky
Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +++

[PATCH v5 25/27] drm/amdgpu: Scope all amdgpu queued work with drm_dev_enter/exit

2021-04-28 Thread Andrey Grodzovsky
To allow completion and further block of HW accesses post device PCI remove. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 11 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 26 +++-- drivers/g

[PATCH v5 21/27] drm/amdgpu: Add support for hot-unplug feature at DRM level.

2021-04-28 Thread Andrey Grodzovsky
To allow scoping DRM IOCTLs with drm_dev_enter/exit. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 8a19b8dd0

[PATCH v5 26/27] drm/amd/display: Remove superflous drm_mode_config_cleanup

2021-04-28 Thread Andrey Grodzovsky
It's already being released by DRM core through devm Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c in

[PATCH v5 24/27] drm/amdkfd: Scope all KFD queued work with drm_dev_enter/exit

2021-04-28 Thread Andrey Grodzovsky
To allow completion and further block of HW accesses post device PCI remove. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c b/drivers/g

[PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

2021-04-28 Thread Andrey Grodzovsky
With this calling drm_dev_unplug will flush and block all in flight IOCTLs Also, add feature such that if device supports graceful unplug we enclose entire IOCTL in SRCU critical section. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/drm_ioctl.c | 15 +-- include/drm/drm_drv.

[PATCH v5 02/27] drm/ttm: Expose ttm_tt_unpopulate for driver use

2021-04-28 Thread Andrey Grodzovsky
It's needed to drop iommu backed pages on device unplug before device's IOMMU group is released. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm/ttm_tt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 48c407cff112..f

[PATCH v5 15/27] drm/scheduler: Fix hang when sched_entity released

2021-04-28 Thread Andrey Grodzovsky
Problem: If scheduler is already stopped by the time sched_entity is released and entity's job_queue not empty I encountred a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle never becomes false. Fix: In drm_sched_fini detach all sched_entities from the scheduler's run queu

[PATCH v5 11/27] drm/sched: Make timeout timer rearm conditional.

2021-04-28 Thread Andrey Grodzovsky
We don't want to rearm the timer if driver hook reports that the device is gone. v5: Update drm_gpu_sched_stat values in code. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/scheduler/sched_main.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/

[PATCH v5 09/27] dmr/amdgpu: Move some sysfs attrs creation to default_attr

2021-04-28 Thread Andrey Grodzovsky
This allows to remove explicit creation and destruction of those attrs and by this avoids warnings on device finilizing post physical device extraction. v5: Use newly added pci_driver.dev_groups directly Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17

[PATCH v5 19/27] drm/amdgpu: Finilise device fences on device remove.

2021-04-28 Thread Andrey Grodzovsky
Make sure all fecens dependent on HW present are force signaled when handling device removal. This helpes later to scope all HW accesing code such as IOCTLs in drm_dev_enter/exit and use drm_dev_unplug as synchronization point past which we know HW will not be accessed anymore outside of pci remove

[PATCH v5 14/27] drm/amdgpu: Fix hang on device removal.

2021-04-28 Thread Andrey Grodzovsky
If removing while commands in flight you cannot wait to flush the HW fences on a ring since the device is gone. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH v5 12/27] drm/amdgpu: Prevent any job recoveries after device is unplugged.

2021-04-28 Thread Andrey Grodzovsky
Return DRM_TASK_STATUS_ENODEV back to the scheduler when device is not present so they timeout timer will not be rearmed. v5: Update to match updated return values in enum drm_gpu_sched_stat Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 --- 1

[PATCH v5 01/27] drm/ttm: Remap all page faults to per process dummy page.

2021-04-28 Thread Andrey Grodzovsky
On device removal reroute all CPU mappings to dummy page. v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function. v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO

[PATCH v5 05/27] drm/amdgpu: Add early fini callback

2021-04-28 Thread Andrey Grodzovsky
Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 59 +--

[PATCH v5 08/27] PCI: add support for dev_groups to struct pci_device_driver

2021-04-28 Thread Andrey Grodzovsky
This is exact copy of 'USB: add support for dev_groups to struct usb_device_driver' patch by Greg but just for the PCI case. Signed-off-by: Andrey Grodzovsky Suggested-by: Greg Kroah-Hartman --- drivers/pci/pci-driver.c | 1 + include/linux/pci.h | 3 +++ 2 files changed, 4 insertions(+)

[PATCH v5 10/27] drm/amdgpu: Guard against write accesses after device removal

2021-04-28 Thread Andrey Grodzovsky
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed. v5: Protect more places wher memcopy_to/form_io takes place Protect IB submissions Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 75 +

[PATCH v5 04/27] drm/amdkfd: Split kfd suspend from devie exit

2021-04-28 Thread Andrey Grodzovsky
Helps to expdite HW related stuff to amdgpu_pci_remove Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c| 3 ++- 3 files changed, 4 insertions(+), 3 deletions(-)

[PATCH v5 07/27] drm/amdgpu: Remap all page faults to per process dummy page.

2021-04-28 Thread Andrey Grodzovsky
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object. v4: Update for modified ttm_bo_vm_dummy_page Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 - 1 file changed, 16 insertions(+), 5 deleti

[PATCH v5 03/27] drm/amdgpu: Split amdgpu_device_fini into early and late

2021-04-28 Thread Andrey Grodzovsky
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the las

[PATCH v5 00/27] RFC Support hot device unplug in amdgpu

2021-04-28 Thread Andrey Grodzovsky
Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed B

[PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-04-28 Thread Eric Huang
In NPS4 BIOS we need to find the closest numa node when creating topology io link between cpu and gpu, if PCI driver doesn't set it. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 94 ++- 1 file changed, 91 insertions(+), 3 deletions(-) diff --git

Re: [PATCH v5 4/5] drm/amdgpu: address remove from fault filter

2021-04-28 Thread philip yang
On 2021-04-28 5:00 a.m., Christian König wrote: Am 28.04.21 um 09:53 schrieb Felix Kuehling: Am 2021-04-28 um 2:54 a.m. schrieb Christian König: Am 27.04.21 um 20:21 schrieb Felix Kuehling:

Re: VEGA 20 (Aka Vega12)

2021-04-28 Thread Alex Deucher
On Wed, Apr 28, 2021 at 10:12 AM Rodrigo Luglio wrote: > > I have a macbook pro with vega 20 which uses the amdgpu firmware vega12 and > when i boot any distro the graphics glitch and the computer freezes. If i > install amdgpu pro on ubuntu it works flawlessly. Would you guys help me > debug t

Re: [PATCH] drm/amdgpu: fix r initial values

2021-04-28 Thread Deucher, Alexander
[AMD Public Use] Reviewed-by: Alex Deucher From: amd-gfx on behalf of Victor Zhao Sent: Wednesday, April 28, 2021 12:40 AM To: amd-gfx@lists.freedesktop.org Cc: Zhao, Victor Subject: [PATCH] drm/amdgpu: fix r initial values Sriov gets suspend of IP block fa

Re: [PATCH 1/2] drm/amd/pm: new gpu_metrics structure for pmfw attached timestamp

2021-04-28 Thread Deucher, Alexander
[AMD Public Use] Assuming the updated table it 64 bit aligned, the series is: Reviewed-by: Alex Deucher From: Quan, Evan Sent: Tuesday, April 27, 2021 9:43 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kasiviswanathan, Harish ; Quan, Evan Subje

Re: [PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-28 Thread StDenis, Tom
[AMD Official Use Only - Internal Distribution Only] Hi Will, I've merged in your v2 patch last week. If that's still the latest you should be good to go. Tom From: Gu, JiaWei (Will) Sent: Wednesday, April 28, 2021 06:38 To: Christian König; Nieto, Da

RE: [PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-28 Thread Gu, JiaWei (Will)
[AMD Official Use Only - Internal Distribution Only] Hi @StDenis, Tom, We have merged vbios info ioctl patch. Could you help re-merge the UMR side one again if it was reverted before? Thanks in advance! Jiawei From: Gu, JiaWei (Will) Sent: Wednesday, April 28, 2021 4

RE: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-28 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only] Hi Christian, Good to know, thanks very much. Best wishes Emily Deng From: Christian König Sent: Wednesday, April 28, 2021 5:07 PM To: Deng, Emily ; Deucher, Alexander Cc: Sun, Roy ; amd-gfx list ; Nieto, David M Subject: Re: [PATCH 1/2] d

Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-28 Thread Christian König
Well none. As I said I will push this upstream through drm-misc-next. Christian. Am 28.04.21 um 10:32 schrieb Deng, Emily: [AMD Official Use Only - Internal Distribution Only] Hi Alex and Christian, What extra work Roy need to do about this patch? And fdinfo? Best wishes Emily Deng *From

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Christian König
Am 28.04.21 um 09:49 schrieb Felix Kuehling: Am 2021-04-28 um 3:04 a.m. schrieb Christian König: Am 28.04.21 um 07:33 schrieb Felix Kuehling: SG BOs do not occupy space that is managed by TTM. So do not evict them. This fixes unexpected evictions of KFD's userptr BOs. KFD only expects userptr

Re: [PATCH v5 4/5] drm/amdgpu: address remove from fault filter

2021-04-28 Thread Christian König
Am 28.04.21 um 09:53 schrieb Felix Kuehling: Am 2021-04-28 um 2:54 a.m. schrieb Christian König: Am 27.04.21 um 20:21 schrieb Felix Kuehling: On 2021-04-27 10:51 a.m., Philip Yang wrote: Add interface to remove address from fault filter ring by resetting fault ring entry key, then future vm

RE: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-28 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only] Hi Alex and Christian, What extra work Roy need to do about this patch? And fdinfo? Best wishes Emily Deng From: amd-gfx On Behalf Of Deucher, Alexander Sent: Tuesday, April 27, 2021 3:52 AM To: Christian König Cc: Sun, Roy ; amd-gfx list ;

RE: [PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-28 Thread Gu, JiaWei (Will)
[AMD Official Use Only - Internal Distribution Only] Thanks Christian, I amended the commit message and resend the patch out. Please feel free to let me know if message is not clear enough. Best regards, Jiawei From: Christian König Sent: Wednesday, April 28, 2021 3:43 PM To: Nieto, David M ;

[PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-28 Thread Jiawei Gu
Add AMDGPU_INFO_VBIOS_INFO subquery id for detailed vbios info. Provides a way for the user application to get the VBIOS information without having to parse the binary. It is useful for the user to be able to display in a simple way the VBIOS version in their system if they happen to encounter an

Re: [RFC PATCH 0/3] A drm_plane API to support HDR planes

2021-04-28 Thread Shashank Sharma
Hello Harry, Many of us in the mail chain have discussed this before, on what is the right way to blend and tone map a SDR and a HDR buffer from same/different color spaces, and what kind of DRM plane properties will be needed. As you can see from the previous comments, that the majority of the

Re: [PATCH v5 4/5] drm/amdgpu: address remove from fault filter

2021-04-28 Thread Felix Kuehling
Am 2021-04-28 um 2:54 a.m. schrieb Christian König: > Am 27.04.21 um 20:21 schrieb Felix Kuehling: >> On 2021-04-27 10:51 a.m., Philip Yang wrote: >>> Add interface to remove address from fault filter ring by resetting >>> fault ring entry key, then future vm fault on the address will be >>> proces

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Felix Kuehling
Am 2021-04-28 um 3:04 a.m. schrieb Christian König: > Am 28.04.21 um 07:33 schrieb Felix Kuehling: >> SG BOs do not occupy space that is managed by TTM. So do not evict them. >> >> This fixes unexpected evictions of KFD's userptr BOs. KFD only expects >> userptr "evictions" in the form of MMU noti

Re: [PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-28 Thread Christian König
Yeah, makes sense. Please note that in the commit message. With that feel free to put an Acked-by: Christian König on it. Regards, Christian. Am 28.04.21 um 09:25 schrieb Nieto, David M: I think this change may be orthogonal to that. Here we want to provide a way for the user application

Re: [PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-28 Thread Nieto, David M
I think this change may be orthogonal to that. Here we want to provide a way for the user application to get the VBIOS information without having to parse the binary… And I agree that we should not have strong dependencies unless the encounter buggy VBIOS on the field, but I still think it is u

  1   2   >