Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Christian König
Am 16.01.24 um 01:05 schrieb Marek Olšák: On Mon, Jan 15, 2024 at 3:06 PM Christian König wrote: Am 15.01.24 um 20:30 schrieb Joshua Ashton: On 1/15/24 19:19, Christian König wrote: Am 15.01.24 um 20:13 schrieb Joshua Ashton: On 1/15/24 18:53, Christian König wrote: Am 15.01.24 um 19:35 sch

Re: [PATCH 2/2] drm/amdgpu: Process fences on IH overflow

2024-01-15 Thread Christian König
Am 15.01.24 um 12:19 schrieb Friedrich Vock: On 15.01.24 11:26, Christian König wrote: Am 14.01.24 um 14:00 schrieb Friedrich Vock: If the IH ring buffer overflows, it's possible that fence signal events were lost. Check each ring for progress to prevent job timeouts/GPU hangs due to the fences

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-15 Thread Christian König
Am 15.01.24 um 12:18 schrieb Friedrich Vock: Adding the original Ccs from the thread since they seemed to be missing in the reply. On 15.01.24 11:55, Christian König wrote: Am 14.01.24 um 14:00 schrieb Friedrich Vock: Allows us to detect subsequent IH ring buffer overflows as well. Well that

[PATCH 0/2] drm/atomic: Allow drivers to write their own plane check for async

2024-01-15 Thread André Almeida
Hi, AMD hardware can do more on the async flip path than just the primary plane, so to lift up the current restrictions, this patchset allows drivers to write their own check for planes for async flips. I'm not sure if adding something new to drm_plane_funcs is the right way to do, because if we

[PATCH 2/2] drm/amdgpu: Implement check_async_props for planes

2024-01-15 Thread André Almeida
AMD GPUs can do async flips with overlay planes and other props rather than just FB ID, so implement a custom check_async_props for AMD planes. Signed-off-by: André Almeida --- .../amd/display/amdgpu_dm/amdgpu_dm_plane.c | 30 +++ 1 file changed, 30 insertions(+) diff --git a/

[PATCH 1/2] drm/atomic: Allow drivers to write their own plane check for async flips

2024-01-15 Thread André Almeida
Some hardware are more flexible on what they can flip asynchronously, so rework the plane check so drivers can implement their own check, lifting up some of the restrictions. Signed-off-by: André Almeida --- drivers/gpu/drm/drm_atomic_uapi.c | 62 ++- include/drm/drm_

Re: [PATCH v4] drm/amdkfd: Set correct svm range actual loc after spliting

2024-01-15 Thread Chen, Xiaogang
With a nitpick below, this patch is: Reviewed-by:Xiaogang Chen On 1/15/2024 4:02 PM, Philip Yang wrote: While svm range partial migrating to system memory, clear dma_addr vram domain flag, otherwise the future split will get incorrect vram_pages and actual loc. After range spliting, set new

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Marek Olšák
On Mon, Jan 15, 2024 at 3:06 PM Christian König wrote: > > Am 15.01.24 um 20:30 schrieb Joshua Ashton: > > On 1/15/24 19:19, Christian König wrote: > >> Am 15.01.24 um 20:13 schrieb Joshua Ashton: > >>> On 1/15/24 18:53, Christian König wrote: > Am 15.01.24 um 19:35 schrieb Joshua Ashton: > >

Re: [PATCH] drm/amdkfd: Correct partial migration virtual addr

2024-01-15 Thread Chen, Xiaogang
This patch is: Reviewed-by Xiaogang Chen On 1/15/2024 4:00 PM, Philip Yang wrote: Partial migration to system memory should use migrate.addr, not prange->start as virtual address to allocate system memory page. Fixes: 18eb61bd5a6a ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page

[pull] amdgpu, amdkfd drm-fixes-6.8

2024-01-15 Thread Alex Deucher
Hi Dave, Sima, Fixes for 6.8. Same PR as Friday, but with new clang warning fixed and dropped KFD changes at Felix' request. The following changes since commit e54478fbdad20f2c58d0a4f99d01299ed8e7fe9c: Merge tag 'amd-drm-next-6.8-2024-01-05' of https://gitlab.freedesktop.org/agd5f/linux into

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Marek Olšák
On Mon, Jan 15, 2024 at 11:41 AM Michel Dänzer wrote: > > On 2024-01-15 17:19, Friedrich Vock wrote: > > On 15.01.24 16:43, Joshua Ashton wrote: > >> On 1/15/24 15:25, Michel Dänzer wrote: > >>> On 2024-01-15 14:17, Christian König wrote: > Am 15.01.24 um 12:37 schrieb Joshua Ashton: > >

Re: [pull] amdgpu, amdkfd drm-fixes-6.8

2024-01-15 Thread Felix Kuehling
On 2024-01-15 17:08, Alex Deucher wrote: Hi Dave, Sima, Same PR as Friday, but with the new clang warning fixed. The following changes since commit e54478fbdad20f2c58d0a4f99d01299ed8e7fe9c: Merge tag 'amd-drm-next-6.8-2024-01-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-nex

[PATCH] drm/amdgpu: Remove unnecessary NULL check

2024-01-15 Thread Felix Kuehling
A static checker pointed out, that bo_va->base.bo was already derefenced earlier in the same scope. Therefore this check is unnecessary here. Reported-by: Dan Carpenter Fixes: 79e7fdec71f2 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Felix Kuehling --- drivers/gpu/

[PATCH v4 7/7] drm/amdkfd: Wait update sdma fence before tlb flush

2024-01-15 Thread Philip Yang
If using sdma update GPU page table, kfd flush tlb does nothing if vm update fence callback doesn't update vm->tlb_seq. This works now because retry fault will come and update page table again and flush tlb finally. With the bitmap_map flag, the retry fault recover will only update GPU page table

[PATCH v4 2/7] drm/amdkfd: Add helper function align range start last

2024-01-15 Thread Philip Yang
Calculate range start, last address aligned to the range granularity size. This removes the duplicate code, and the helper function will be used in the future patch to handle map, unmap to GPU based on range granularity. No functional change. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/am

[PATCH v4 5/7] drm/amdkfd: Change range granularity update bitmap_map

2024-01-15 Thread Philip Yang
When changing the svm range granularity, update the svm range bitmap_map based on new range granularity. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 49 +++- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH v4 4/7] amd/amdkfd: Unmap range from GPU based on granularity

2024-01-15 Thread Philip Yang
When MMU notifier invalidate the range, align the start and last address to range granularity to unmap from GPU and update bitmap_map flag. Skip unmap from GPU if range is already unmapped based on bitmap_map flag. This avoids unmap 1 page from GPU and flush TLB, also solve the rocgdb CWSR migrati

[PATCH v4 6/7] drm/amdkfd: Check bitmap_map flag to skip retry fault

2024-01-15 Thread Philip Yang
Remove prange validate_timestamp which is not accurate for multiple GPUs. Use the bitmap_map flag to skip the retry fault from different pages of the same granularity range if the granularity range is already mapped on the specific GPU. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling ---

[PATCH v4 3/7] drm/amdkfd: Add granularity size based bitmap map flag

2024-01-15 Thread Philip Yang
Replace prange->mapped_to_gpu with prange->bitmap_map[], which is per GPU flag and use bitmap bits based on prange granularity. Align map to GPU or unmap from GPU range size to granularity size and update the corresponding bitmap_map flag bits. This will optimize multiple GPU map, unmap and retry

[PATCH v4 1/7] drm/amdkfd: Add helper function svm_range_need_access_gpus

2024-01-15 Thread Philip Yang
Add the helper function to get all GPUs bitmap that need access the svm range. This helper will be used in the following patch to check if prange is mapped to all gpus. Refactor svm_range_validate_and_map to use the helper function, no functional change. Signed-off-by: Philip Yang Reviewed-by: F

[pull] amdgpu, amdkfd drm-fixes-6.8

2024-01-15 Thread Alex Deucher
Hi Dave, Sima, Same PR as Friday, but with the new clang warning fixed. The following changes since commit e54478fbdad20f2c58d0a4f99d01299ed8e7fe9c: Merge tag 'amd-drm-next-6.8-2024-01-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next (2024-01-09 09:07:50 +1000) are available

[PATCH v4] drm/amdkfd: Set correct svm range actual loc after spliting

2024-01-15 Thread Philip Yang
While svm range partial migrating to system memory, clear dma_addr vram domain flag, otherwise the future split will get incorrect vram_pages and actual loc. After range spliting, set new range and old range actual_loc: new range actual_loc is 0 if new->vram_pages is 0. old range actual_loc is 0 i

[PATCH] drm/amdkfd: Correct partial migration virtual addr

2024-01-15 Thread Philip Yang
Partial migration to system memory should use migrate.addr, not prange->start as virtual address to allocate system memory page. Fixes: 18eb61bd5a6a ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM" Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate

Re: [PATCH] drm/amd/display: Fix a switch statement in populate_dml_output_cfg_from_stream_state()

2024-01-15 Thread Hamza Mahfooz
On 1/13/24 09:58, Christophe JAILLET wrote: It is likely that the statement related to 'dml_edp' is misplaced. So move it in the correct "case SIGNAL_TYPE_EDP". Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2") Signed-off-by: Christophe JAILLET Nice catch! Applied, thanks! --- drive

Re: [PATCH] drm/amd/display: Drop 'acrtc' and add 'new_crtc_state' NULL check for writeback requests.

2024-01-15 Thread Alex Hung
Thanks for catching this. Reviewed-by: Alex Hung On 2024-01-13 02:11, Srinivasan Shanmugam wrote: Return value of 'to_amdgpu_crtc' which is container_of(...) can't be null, so it's null check 'acrtc' is dropped. Fixing the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:930

Re: Documentation for RGB strip on RX 7900 XTX (Reference)

2024-01-15 Thread Harry Wentland
On 2024-01-09 03:31, Christian König wrote: > Am 09.01.24 um 09:23 schrieb Alexander Koskovich: >> Thanks for the info, will take a look. >> >> Also just to clarify, this is the first party AMD 7900 XTX, not a third >> party AIB (e.g. Sapphire, ASRock, etc). > > Yeah, but that doesn't matter.

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 19:57, Christian König wrote: Am 15.01.24 um 20:30 schrieb Joshua Ashton: On 1/15/24 19:19, Christian König wrote: Am 15.01.24 um 20:13 schrieb Joshua Ashton: On 1/15/24 18:53, Christian König wrote: Am 15.01.24 um 19:35 schrieb Joshua Ashton: On 1/15/24 18:30, Bas Nieuwenhuize

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Christian König
Am 15.01.24 um 20:30 schrieb Joshua Ashton: On 1/15/24 19:19, Christian König wrote: Am 15.01.24 um 20:13 schrieb Joshua Ashton: On 1/15/24 18:53, Christian König wrote: Am 15.01.24 um 19:35 schrieb Joshua Ashton: On 1/15/24 18:30, Bas Nieuwenhuizen wrote: On Mon, Jan 15, 2024 at 7:14 PM Frie

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 19:19, Christian König wrote: Am 15.01.24 um 20:13 schrieb Joshua Ashton: On 1/15/24 18:53, Christian König wrote: Am 15.01.24 um 19:35 schrieb Joshua Ashton: On 1/15/24 18:30, Bas Nieuwenhuizen wrote: On Mon, Jan 15, 2024 at 7:14 PM Friedrich Vock mailto:friedrich.v...@gmx.de>>

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Christian König
Am 15.01.24 um 20:13 schrieb Joshua Ashton: On 1/15/24 18:53, Christian König wrote: Am 15.01.24 um 19:35 schrieb Joshua Ashton: On 1/15/24 18:30, Bas Nieuwenhuizen wrote: On Mon, Jan 15, 2024 at 7:14 PM Friedrich Vock mailto:friedrich.v...@gmx.de>> wrote:     Re-sending as plaintext, sorry

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 18:53, Christian König wrote: Am 15.01.24 um 19:35 schrieb Joshua Ashton: On 1/15/24 18:30, Bas Nieuwenhuizen wrote: On Mon, Jan 15, 2024 at 7:14 PM Friedrich Vock > wrote:     Re-sending as plaintext, sorry about that     On 15.01.24 18:54, Michel

[PATCH] drm/amdkfd: Use S_ENDPGM_SAVED in trap handler

2024-01-15 Thread Jay Cornwall
This instruction has no functional difference to S_ENDPGM but allows performance counters to track save events correctly. Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 14 +++--- .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm | 2 +- .../gpu/dr

Re: Proposal to add CRIU support to DRM render nodes

2024-01-15 Thread Felix Kuehling
I haven't seen any replies to this proposal. Either it got lost in the pre-holiday noise, or there is genuinely no interest in this. If it's the latter, I would look for an AMDGPU driver-specific solution with minimally invasive changes in DRM and DMABuf code, if needed. Maybe it could be gene

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Christian König
Am 15.01.24 um 19:35 schrieb Joshua Ashton: On 1/15/24 18:30, Bas Nieuwenhuizen wrote: On Mon, Jan 15, 2024 at 7:14 PM Friedrich Vock > wrote:     Re-sending as plaintext, sorry about that     On 15.01.24 18:54, Michel Dänzer wrote: > On 2024-01-15 18:26, Fri

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 18:30, Bas Nieuwenhuizen wrote: On Mon, Jan 15, 2024 at 7:14 PM Friedrich Vock > wrote: Re-sending as plaintext, sorry about that On 15.01.24 18:54, Michel Dänzer wrote: > On 2024-01-15 18:26, Friedrich Vock wrote: >> [snip]

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Bas Nieuwenhuizen
On Mon, Jan 15, 2024 at 7:14 PM Friedrich Vock wrote: > Re-sending as plaintext, sorry about that > > On 15.01.24 18:54, Michel Dänzer wrote: > > On 2024-01-15 18:26, Friedrich Vock wrote: > >> [snip] > >> The fundamental problem here is that not telling applications that > >> something went wron

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Friedrich Vock
Re-sending as plaintext, sorry about that On 15.01.24 18:54, Michel Dänzer wrote: On 2024-01-15 18:26, Friedrich Vock wrote: [snip] The fundamental problem here is that not telling applications that something went wrong when you just canceled their work midway is an out-of-spec hack. When there

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Friedrich Vock
On 15.01.24 18:54, Michel Dänzer wrote: On 2024-01-15 18:26, Friedrich Vock wrote: [snip] The fundamental problem here is that not telling applications that something went wrong when you just canceled their work midway is an out-of-spec hack. When there is a report of real-world apps breaking be

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 17:09, Michel Dänzer wrote: On 2024-01-15 17:46, Joshua Ashton wrote: On 1/15/24 16:34, Michel Dänzer wrote: On 2024-01-15 17:19, Friedrich Vock wrote: On 15.01.24 16:43, Joshua Ashton wrote: On 1/15/24 15:25, Michel Dänzer wrote: On 2024-01-15 14:17, Christian König wrote: Am

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Michel Dänzer
On 2024-01-15 18:26, Friedrich Vock wrote: > On 15.01.24 18:09, Michel Dänzer wrote: >> On 2024-01-15 17:46, Joshua Ashton wrote: >>> On 1/15/24 16:34, Michel Dänzer wrote: On 2024-01-15 17:19, Friedrich Vock wrote: > On 15.01.24 16:43, Joshua Ashton wrote: >> On 1/15/24 15:25, Michel

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Friedrich Vock
On 15.01.24 18:09, Michel Dänzer wrote: On 2024-01-15 17:46, Joshua Ashton wrote: On 1/15/24 16:34, Michel Dänzer wrote: On 2024-01-15 17:19, Friedrich Vock wrote: On 15.01.24 16:43, Joshua Ashton wrote: On 1/15/24 15:25, Michel Dänzer wrote: On 2024-01-15 14:17, Christian König wrote: Am 1

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Michel Dänzer
On 2024-01-15 17:46, Joshua Ashton wrote: > On 1/15/24 16:34, Michel Dänzer wrote: >> On 2024-01-15 17:19, Friedrich Vock wrote: >>> On 15.01.24 16:43, Joshua Ashton wrote: On 1/15/24 15:25, Michel Dänzer wrote: > On 2024-01-15 14:17, Christian König wrote: >> Am 15.01.24 um 12:37 schr

Re: [PATCH] drm/amd/display: Avoid enum conversion warning

2024-01-15 Thread Alex Deucher
Applied. Thanks! On Wed, Jan 10, 2024 at 3:56 PM Nathan Chancellor wrote: > > Clang warns (or errors with CONFIG_WERROR=y) when performing arithmetic > with different enumerated types, which is usually a bug: > > > drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:54

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 16:34, Michel Dänzer wrote: On 2024-01-15 17:19, Friedrich Vock wrote: On 15.01.24 16:43, Joshua Ashton wrote: On 1/15/24 15:25, Michel Dänzer wrote: On 2024-01-15 14:17, Christian König wrote: Am 15.01.24 um 12:37 schrieb Joshua Ashton: On 1/15/24 09:40, Christian König wrote:

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Michel Dänzer
On 2024-01-15 17:19, Friedrich Vock wrote: > On 15.01.24 16:43, Joshua Ashton wrote: >> On 1/15/24 15:25, Michel Dänzer wrote: >>> On 2024-01-15 14:17, Christian König wrote: Am 15.01.24 um 12:37 schrieb Joshua Ashton: > On 1/15/24 09:40, Christian König wrote: >> Am 13.01.24 um 15:02

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Friedrich Vock
On 15.01.24 16:43, Joshua Ashton wrote: On 1/15/24 15:25, Michel Dänzer wrote: On 2024-01-15 14:17, Christian König wrote: Am 15.01.24 um 12:37 schrieb Joshua Ashton: On 1/15/24 09:40, Christian König wrote: Am 13.01.24 um 15:02 schrieb Joshua Ashton: Without this feedback, the applicat

[PATCH v2 4/4] drm/i915/display: Add handling for new "force color format" property

2024-01-15 Thread Andri Yngvason
From: Werner Sembach This commit implements the "force color format" drm property for the Intel GPU driver. Signed-off-by: Werner Sembach Co-Developed-by: Andri Yngvason Signed-off-by: Andri Yngvason Tested-by: Andri Yngvason --- Changes in v2: - Renamed to "force color format" from "prefe

[PATCH v2 0/4] New DRM properties for output color format

2024-01-15 Thread Andri Yngvason
After some discussion, we decided to drop the "active color format" property and rename the "preferred color format" property to "force color format". The user can probe available color formats in combination with other properties using TEST_ONLY commits. v1: https://lore.kernel.org/dri-devel/2

[PATCH v2 3/4] drm/amd/display: Add handling for new "force color format" property

2024-01-15 Thread Andri Yngvason
From: Werner Sembach This commit implements the "force color format" drm property for the AMD GPU driver. Signed-off-by: Werner Sembach Co-Developed-by: Andri Yngvason Signed-off-by: Andri Yngvason Tested-by: Andri Yngvason --- Changes in v2: - Renamed to "force color format" from "preferr

[PATCH v2 2/4] drm/uAPI: Add "force color format" drm property as setting for userspace

2024-01-15 Thread Andri Yngvason
From: Werner Sembach Add a new general drm property "force color format" which can be used by userspace to tell the graphics driver which color format to use. Possible options are: - auto (default/current behaviour) - rgb - ycbcr444 - ycbcr422 (supported by neither amdgpu or i915

[PATCH v2 1/4] drm/amd/display: Remove unnecessary SIGNAL_TYPE_HDMI_TYPE_A check

2024-01-15 Thread Andri Yngvason
From: Werner Sembach Remove unnecessary SIGNAL_TYPE_HDMI_TYPE_A check that was performed in the drm_mode_is_420_only() case, but not in the drm_mode_is_420_also() && force_yuv420_output case. Without further knowledge if YCbCr 4:2:0 is supported outside of HDMI, there is no reason to use RGB whe

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 15:25, Michel Dänzer wrote: On 2024-01-15 14:17, Christian König wrote: Am 15.01.24 um 12:37 schrieb Joshua Ashton: On 1/15/24 09:40, Christian König wrote: Am 13.01.24 um 15:02 schrieb Joshua Ashton: Without this feedback, the application may keep pushing through the soft reco

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Michel Dänzer
On 2024-01-15 14:17, Christian König wrote: > Am 15.01.24 um 12:37 schrieb Joshua Ashton: >> On 1/15/24 09:40, Christian König wrote: >>> Am 13.01.24 um 15:02 schrieb Joshua Ashton: >>> Without this feedback, the application may keep pushing through the soft recoveries, continually hangin

Re: [PATCH] drm/amdkfd: init drm_client with funcs hook

2024-01-15 Thread Felix Kuehling
On 2024-01-12 3:05, Flora Cui wrote: otherwise drm_client_dev_unregister() would try to kfree(&adev->kfd.client). Signed-off-by: Flora Cui Thank you for finding and fixing this bug. You can add: Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Reviewed-by: Feli

Re: [PATCH 2/7] drm/uAPI: Add "active color format" drm property as feedback for userspace

2024-01-15 Thread Sebastian Wick
On Thu, Jan 11, 2024 at 05:17:46PM +, Andri Yngvason wrote: > mið., 10. jan. 2024 kl. 13:26 skrifaði Daniel Stone : > > > > > > This thing here works entirely differently, and I think we need somewhat > > > new semantics for this: > > > > > > - I agree it should be read-only for userspace, so i

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 13:17, Christian König wrote: Am 15.01.24 um 12:37 schrieb Joshua Ashton: On 1/15/24 09:40, Christian König wrote: Am 13.01.24 um 15:02 schrieb Joshua Ashton: We need to bump the karma of the drm_sched job in order for the context that we just recovered to get correct feedback th

Re: Failed to create a rescuer kthread for the amdgpu-reset-dev workqueue

2024-01-15 Thread Thomas Perrot
Hello Christian, On Fri, 2024-01-12 at 09:17 +0100, Christian König wrote: > Well the driver load is interrupted for some reason. > > Have you set any timeout for modprobe? > We don't set a modprobe timeout. Kind regards, Thomas > Regards, > Christian. > > Am 12.01.24 um 09:11 schrieb Thomas

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 13:19, Christian König wrote: Am 15.01.24 um 12:54 schrieb Joshua Ashton: [SNIP] The question here is really if we should handled soft recovered errors as fatal or not. Marek is in pro of that Michel is against it. Figure out what you want in userspace and I'm happy to impleme

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Christian König
Am 15.01.24 um 12:54 schrieb Joshua Ashton: [SNIP] The question here is really if we should handled soft recovered errors as fatal or not. Marek is in pro of that Michel is against it. Figure out what you want in userspace and I'm happy to implement it :) (That being said, without my patc

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Christian König
Am 15.01.24 um 12:37 schrieb Joshua Ashton: On 1/15/24 09:40, Christian König wrote: Am 13.01.24 um 15:02 schrieb Joshua Ashton: We need to bump the karma of the drm_sched job in order for the context that we just recovered to get correct feedback that it is guilty of hanging. Big NAK to that

RE: [PATCH] drm/amdgpu: fix sdma ecc irq unbalanced issue

2024-01-15 Thread Kamal, Asad
[AMD Official Use Only - General] Reviewed-by: Asad Kamal Thanks & Regards Asad -Original Message- From: amd-gfx On Behalf Of Yang Wang Sent: Monday, January 15, 2024 5:33 PM To: amd-gfx@lists.freedesktop.org Cc: Wang, Yang(Kevin) ; Zhang, Hawking Subject: [PATCH] drm/amdgpu: fix sdm

[PATCH] drm/amdgpu: fix sdma ecc irq unbalanced issue

2024-01-15 Thread Yang Wang
fix sdma ecc irq unblanced issue when do mode2 reset. Fixes: 90b87f67124a ("drm/amdgpu: add sdma v4.4.2 ACA support") Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/am

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 09:47, Christian König wrote: Am 13.01.24 um 23:55 schrieb Joshua Ashton: +Marek On 1/13/24 21:35, André Almeida wrote: Hi Joshua, Em 13/01/2024 11:02, Joshua Ashton escreveu: We need to bump the karma of the drm_sched job in order for the context that we just recovered to get

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Joshua Ashton
On 1/15/24 09:40, Christian König wrote: Am 13.01.24 um 15:02 schrieb Joshua Ashton: We need to bump the karma of the drm_sched job in order for the context that we just recovered to get correct feedback that it is guilty of hanging. Big NAK to that approach, the karma handling is completel

Re: [PATCH 2/2] drm/amdgpu: Process fences on IH overflow

2024-01-15 Thread Friedrich Vock
On 15.01.24 11:26, Christian König wrote: Am 14.01.24 um 14:00 schrieb Friedrich Vock: If the IH ring buffer overflows, it's possible that fence signal events were lost. Check each ring for progress to prevent job timeouts/GPU hangs due to the fences staying unsignaled despite the work being don

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-15 Thread Friedrich Vock
Adding the original Ccs from the thread since they seemed to be missing in the reply. On 15.01.24 11:55, Christian König wrote: Am 14.01.24 um 14:00 schrieb Friedrich Vock: Allows us to detect subsequent IH ring buffer overflows as well. Well that suggested handling here is certainly broken,

Re: [PATCH 2/2] drm/amdgpu: Process fences on IH overflow

2024-01-15 Thread Christian König
Am 14.01.24 um 14:00 schrieb Friedrich Vock: If the IH ring buffer overflows, it's possible that fence signal events were lost. Check each ring for progress to prevent job timeouts/GPU hangs due to the fences staying unsignaled despite the work being done. That's completely unnecessary and in s

Re: Failed to create a rescuer kthread for the amdgpu-reset-dev workqueue

2024-01-15 Thread Christian König
Am 15.01.24 um 11:17 schrieb Thomas Perrot: Hello Christian, On Fri, 2024-01-12 at 09:17 +0100, Christian König wrote: Well the driver load is interrupted for some reason. Have you set any timeout for modprobe? We don't set a modprobe timeout. Well you somehow abort probing the driver. Th

Re: [PATCH] drm/amdgpu: Remove usage of the deprecated ida_simple_xx() API

2024-01-15 Thread Christian König
Am 14.01.24 um 16:14 schrieb Christophe JAILLET: ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_range() is inclusive. So a -1 has been added when neede

RE: [PATCH v3] drm/amd/amdgpu: Update RLC_SPM_MC_CNT by ring wreg

2024-01-15 Thread YuanShang Mao (River)
[AMD Official Use Only - General] Ping... -Original Message- From: YuanShang Mao (River) Sent: Saturday, January 13, 2024 2:58 PM To: amd-gfx@lists.freedesktop.org Cc: YuanShang Mao (River) ; YuanShang Mao (River) Subject: [PATCH v3] drm/amd/amdgpu: Update RLC_SPM_MC_CNT by ring wreg

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Christian König
Am 13.01.24 um 23:55 schrieb Joshua Ashton: +Marek On 1/13/24 21:35, André Almeida wrote: Hi Joshua, Em 13/01/2024 11:02, Joshua Ashton escreveu: We need to bump the karma of the drm_sched job in order for the context that we just recovered to get correct feedback that it is guilty of hanging

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Christian König
Am 13.01.24 um 15:02 schrieb Joshua Ashton: We need to bump the karma of the drm_sched job in order for the context that we just recovered to get correct feedback that it is guilty of hanging. Big NAK to that approach, the karma handling is completely deprecated. When you want to signal execut

[PATCH] drm/amd/display: remove kernel-doc misuses in dmub_replay.c

2024-01-15 Thread Randy Dunlap
Change non-kernel-doc comments from "/**" to common "/*" to prevent kernel-doc warnings: dmub_replay.c:262: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Set REPLAY power optimization flags and coasting vtotal. dmub_replay

[PATCH] drm/amd/display: Fix a switch statement in populate_dml_output_cfg_from_stream_state()

2024-01-15 Thread Christophe JAILLET
It is likely that the statement related to 'dml_edp' is misplaced. So move it in the correct "case SIGNAL_TYPE_EDP". Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2") Signed-off-by: Christophe JAILLET --- drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c | 2 +- 1 file changed,

[PATCH] drm/amdgpu: Remove usage of the deprecated ida_simple_xx() API

2024-01-15 Thread Christophe JAILLET
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_range() is inclusive. So a -1 has been added when needed. Signed-off-by: Christophe JAILLET --- drivers