Re: [PATCH] amdgpu/dc: Use DRM new-style object iterators.

2017-10-11 Thread Maarten Lankhorst
Op 11-10-17 om 22:40 schreef Harry Wentland: > On 2017-10-11 03:46 PM, Maarten Lankhorst wrote: >> Op 11-10-17 om 20:55 schreef Leo: >>> >>> On 2017-10-11 10:30 AM, Maarten Lankhorst wrote: Op 11-10-17 om 16:24 schreef sunpeng...@amd.com: > From: "Leo (Sunpeng) Li" > > Use the cor

Re: [PATCH] drm/amd/pp: add new sysfs pp_alloc_mem_for_smu

2017-10-11 Thread Alex Deucher
On Wed, Oct 11, 2017 at 7:28 AM, Rex Zhu wrote: > Change-Id: Ie06f87445e7d6945472d88ac976693c98d96cd43 > Signed-off-by: Rex Zhu Please add a better patch description. Something like: Add a sysfs interface to allocate a smu logging buffer on the fly. Additionally, wouldn't this be better in deb

Re: [PATCH] drm/amdgpu: correct reference clock value on vega10

2017-10-11 Thread Wang, Ken
got it, I will send another patch for reviewing. From: Alex Deucher Sent: Wednesday, October 11, 2017 9:30:01 PM To: Wang, Ken Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: correct reference clock value on vega10 On Wed, O

RE: [PATCH] drm/amdgpu: correct reference clock value on vega10

2017-10-11 Thread Deucher, Alexander
> -Original Message- > From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf > Of ken.w...@amd.com > Sent: Wednesday, October 11, 2017 10:41 PM > To: amd-gfx@lists.freedesktop.org > Cc: Wang, Ken > Subject: [PATCH] drm/amdgpu: correct reference clock value on vega10 > > Fr

[PATCH] drm/amdgpu: correct reference clock value on vega10

2017-10-11 Thread Ken.Wang
From: Ken Wang Change-Id: I377029075af1e2e002f7cfd793ddd58d8610e474 Signed-off-by: Ken Wang --- drivers/gpu/drm/amd/amdgpu/soc15.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 7839677..88d5

[PATCH] drm/amdgpu: Fix extra call to amdgpu_ctx_put.

2017-10-11 Thread Andrey Grodzovsky
In amdgpu_cs_parser_init() in case of error handling amdgpu_ctx_put() is called without setting p->ctx to NULL after that, later amdgpu_cs_parser_fini() also calls amdgpu_ctx_put() again and mess up the reference count. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

[PATCH v2 2/2] drm/amdgpu: Add amdgpu_find_mm_node()

2017-10-11 Thread Harish Kasiviswanathan
v2: Use amdgpu_find_mm_node() in amdgpu_ttm_io_mem_pfn() Change-Id: I12231e18bb60152843cd0e0213ddd0d0e04e7497 Signed-off-by: Harish Kasiviswanathan --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 49 ++--- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/

Re: [PATCH] amdgpu/dc: Use DRM new-style object iterators.

2017-10-11 Thread Harry Wentland
On 2017-10-11 03:46 PM, Maarten Lankhorst wrote: > Op 11-10-17 om 20:55 schreef Leo: >> >> >> On 2017-10-11 10:30 AM, Maarten Lankhorst wrote: >>> Op 11-10-17 om 16:24 schreef sunpeng...@amd.com: From: "Leo (Sunpeng) Li" Use the correct for_each_new/old_* iterators instead of for_ea

[PATCH v2 1/2] drm/amdgpu: Refactor amdgpu_move_blit

2017-10-11 Thread Harish Kasiviswanathan
Add more generic function amdgpu_copy_ttm_mem_to_mem() that supports arbitrary copy size, offsets and two BOs (source & dest.). This is useful for KFD Cross Memory Attach feature where data needs to be copied from BOs from different processes v2: Add struct amdgpu_copy_mem and changed amdgpu_copy

Re: [PATCH] amdgpu/dc: Use DRM new-style object iterators.

2017-10-11 Thread Maarten Lankhorst
Op 11-10-17 om 20:55 schreef Leo: > > > On 2017-10-11 10:30 AM, Maarten Lankhorst wrote: >> Op 11-10-17 om 16:24 schreef sunpeng...@amd.com: >>> From: "Leo (Sunpeng) Li" >>> >>> Use the correct for_each_new/old_* iterators instead of for_each_* >>> >>> List of affected functions: >>> >>> amdgpu_dm

Re: [PATCH] amdgpu/dc: Use DRM new-style object iterators.

2017-10-11 Thread Leo
On 2017-10-11 10:30 AM, Maarten Lankhorst wrote: Op 11-10-17 om 16:24 schreef sunpeng...@amd.com: From: "Leo (Sunpeng) Li" Use the correct for_each_new/old_* iterators instead of for_each_* List of affected functions: amdgpu_dm_find_first_crtc_matching_connector: use for_each_new - Ol

Re: [PATCH 2/2] drm/amd/display: drop unused dm_delay_in_microseconds

2017-10-11 Thread Harry Wentland
On 2017-10-11 12:51 PM, Alex Deucher wrote: > No longer used. > > Signed-off-by: Alex Deucher Series is Reviewed-by: Harry Wentland Harry > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c | 7 --- > 1 file changed, 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/dis

Re: [PATCH] drm/amd/display: Add comment for NaN checks in DCN calcs

2017-10-11 Thread Alex Deucher
On Wed, Oct 11, 2017 at 10:55 AM, Harry Wentland wrote: > This is confusing as-is and really needs a comment. > > Signed-off-by: Harry Wentland Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/driv

[PATCH 1/2] drm/amd/display/dc: drop dm_delay_in_microseconds

2017-10-11 Thread Alex Deucher
Use udelay directly. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c | 4 ++-- drivers/gpu/drm/amd/display/dc/dce/dce_dmcu.c | 4 ++-- drivers/gpu/drm/amd/display/dc/dm_services.h | 3 --- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git

[PATCH 2/2] drm/amd/display: drop unused dm_delay_in_microseconds

2017-10-11 Thread Alex Deucher
No longer used. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c index 2c3c

Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-10-11 Thread Michel Dänzer
On 28/09/17 04:55 PM, Nicolai Hähnle wrote: > From: Nicolai Hähnle > > Highly concurrent Piglit runs can trigger a race condition where a pending > SDMA job on a buffer object is never executed because the corresponding > process is killed (perhaps due to a crash). Since the job's fences were > n

Re: [PATCH][drm-next] drm/amdgpu: make function uvd_v6_0_enc_get_destroy_msg static

2017-10-11 Thread Alex Deucher
On Wed, Oct 11, 2017 at 6:41 AM, Christian König wrote: > Am 11.10.2017 um 11:21 schrieb Colin King: >> >> From: Colin Ian King >> >> The function uvd_v6_0_enc_get_destroy_msg is local to the source and >> does not need to be in global scope, so make it static. >> >> Cleans up sparse warning: >>

RE: [PATCH] amdgpu/dc: Use DRM new-style object iterators.

2017-10-11 Thread Deucher, Alexander
> -Original Message- > From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf > Of Maarten Lankhorst > Sent: Wednesday, October 11, 2017 10:30 AM > To: Li, Sun peng (Leo); airl...@gmail.com; amd-gfx@lists.freedesktop.org > Cc: Wentland, Harry; dri-de...@lists.freedesktop.org

Re: [PATCH 11/11] drm/amd/display: make amdgpu_dm_irq_handler static

2017-10-11 Thread Harry Wentland
1-4, 6-11 are Reviewed-by: Harry Wentland Harry On 2017-10-11 10:06 AM, Alex Deucher wrote: > It's not used outside the file. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 6 +++--- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.h | 9 -

[PATCH] amdgpu/dc: Use DRM new-style object iterators.

2017-10-11 Thread sunpeng.li
From: "Leo (Sunpeng) Li" Use the correct for_each_new/old_* iterators instead of for_each_* List of affected functions: amdgpu_dm_find_first_crtc_matching_connector: use for_each_new - Old from_state_var flag was always choosing the new state amdgpu_dm_display_resume: use for_each_new

Re: [PATCH 05/11] drm/amd/display: implement dm_delay_in_microseconds

2017-10-11 Thread Harry Wentland
On 2017-10-11 10:06 AM, Alex Deucher wrote: > dc uses this. Not sure how important it is. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/am

[PATCH] drm/amd/display: Add comment for NaN checks in DCN calcs

2017-10-11 Thread Harry Wentland
This is confusing as-is and really needs a comment. Signed-off-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc

Re: [PATCH] amdgpu/dc: Use DRM new-style object iterators.

2017-10-11 Thread Maarten Lankhorst
Op 11-10-17 om 16:24 schreef sunpeng...@amd.com: > From: "Leo (Sunpeng) Li" > > Use the correct for_each_new/old_* iterators instead of for_each_* > > List of affected functions: > > amdgpu_dm_find_first_crtc_matching_connector: use for_each_new > - Old from_state_var flag was always choosing

Re: [PATCH 01/11] drm/amd/display: fix typo in function name

2017-10-11 Thread Christian König
Am 11.10.2017 um 16:06 schrieb Alex Deucher: s/amdgpu_dm_find_first_crct_matching_connector/ amdgpu_dm_find_first_crtc_matching_connector/ And while here, make it static. Signed-off-by: Alex Deucher Acked-by: Christian König --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +++

amd-gfx@lists.freedesktop.org

2017-10-11 Thread Deucher, Alexander
> -Original Message- > From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf > Of Horace Chen > Sent: Wednesday, October 11, 2017 3:05 AM > To: amd-gfx@lists.freedesktop.org > Cc: Chen, Horace > Subject: [PATCH] drm/amdgpu: SR-IOV data exchange between PF&VF > > SR-IOV nee

[PATCH 11/11] drm/amd/display: make amdgpu_dm_irq_handler static

2017-10-11 Thread Alex Deucher
It's not used outside the file. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 6 +++--- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.h | 9 - 2 files changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgp

[PATCH 08/11] drm/amd/display: make log_dpcd static

2017-10-11 Thread Alex Deucher
It's only used in this file. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgp

[PATCH 07/11] drm/amd/display: whitespace cleanup in amdgpu_dm_mst_types.c/h

2017-10-11 Thread Alex Deucher
To match kernel standards. No intended functional change. Signed-off-by: Alex Deucher --- .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c| 39 +++--- .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.h| 5 ++- 2 files changed, 22 insertions(+), 22 deletions(-) diff --git a

[PATCH 05/11] drm/amd/display: implement dm_delay_in_microseconds

2017-10-11 Thread Alex Deucher
dc uses this. Not sure how important it is. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c b/drivers/gpu/drm/amd/display/

[PATCH 06/11] drm/amd/display: drop unused functions in amdgpu_dm_services.c

2017-10-11 Thread Alex Deucher
not used. Signed-off-by: Alex Deucher --- .../drm/amd/display/amdgpu_dm/amdgpu_dm_services.c | 53 -- 1 file changed, 53 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_services.c index 2

[PATCH 01/11] drm/amd/display: fix typo in function name

2017-10-11 Thread Alex Deucher
s/amdgpu_dm_find_first_crct_matching_connector/ amdgpu_dm_find_first_crtc_matching_connector/ And while here, make it static. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 ++-- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 6 -- 2 fil

[PATCH 04/11] drm/amd/display: drop unused functions in amdgpu_dm.c

2017-10-11 Thread Alex Deucher
Not used anywhere. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 82 --- 1 file changed, 82 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 1b3cc8d..0

[PATCH 03/11] drm/amd/display: make a bunch of stuff in amdgpu_dm.c static

2017-10-11 Thread Alex Deucher
Not used outside of that file. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 136 +++--- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 59 -- 2 files changed, 91 insertions(+), 104 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH 09/11] drm/amd/display: whitespace cleanup in amdgpu_dm_irq.c/h

2017-10-11 Thread Alex Deucher
To match kernel standards. No intended functional change. Signed-off-by: Alex Deucher --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 111 + .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.h | 38 +++ 2 files changed, 64 insertions(+), 85 deletions(-) diff --

[PATCH 02/11] drm/amd/display: whitespace cleanup in amdgpu_dm.c/h

2017-10-11 Thread Alex Deucher
To match kernel standards. No intended functional change. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 330 ++ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 91 +++--- 2 files changed, 192 insertions(+), 229 deletions(-) diff --

[PATCH 10/11] drm/amd/display: remove unused functions in amdgpu_dm_irq.c

2017-10-11 Thread Alex Deucher
Not used. Signed-off-by: Alex Deucher --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 62 -- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.h | 5 -- 2 files changed, 67 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c b/drivers/gp

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
Yeah, I just thought of it, agree that shouldn't keep copy in entity, otherwise too complicated to handle BR Monk -Original Message- From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] Sent: Wednesday, October 11, 2017 10:04 PM To: Liu, Monk ; Koenig, Christian ; Zhou, Dav

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Christian König
Yeah, that was exactly my thinking as well. Christian. Am 11.10.2017 um 15:59 schrieb Liu, Monk: But if we keep counter in entity, there is one issue I suddenly though of : For regular user context, after vram lost UMD will aware this context is LOST since we have a counter copy in context, s

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Christian König
I remember even the VM update job is with a kernel entity, (no context is true), and if entity can keep a counter copy That won't work. We want to keep the entities associated with VM updates and buffer moves alive, but their jobs canceled. Regards, Christian. Am 11.10.2017 um 15:51 schrieb L

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
But if we keep counter in entity, there is one issue I suddenly though of : For regular user context, after vram lost UMD will aware this context is LOST since we have a counter copy in context, so user space can close it and re-create one But for kernel entity, since no U/K interface, so it is

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
> Some jobs don't have a context (VM updates, clears, buffer moves). What? I remember even the VM update job is with a kernel entity, (no context is true), and if entity can keep a counter copy That can solve your concerns -Original Message- From: Koenig, Christian Sent: Wednesday, O

RE: [PATCH 2/4] drm/amdgpu: keep copy of VRAM lost counter in job

2017-10-11 Thread Liu, Monk
Christian, Can you elaborate what benefit it brings that keep a counter copy in JOB ? Because you know when vram lost happens, the thing real matter is : if the BO in BO_LIST of this job modified/created before this vram lost, Not that if the job creating timestamp ... Since kernel side now do

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Christian König
Some jobs don't have a context (VM updates, clears, buffer moves). I would still like to abort those when they where issued before a losing VRAM content, but keep the entity usable. So I think we should just keep a copy of the VRAM lost counter in the job. That also removes us from the burden

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
I think just compare the copy from context/entity with current counter is enough, don't see how it's better to keep another copy in JOB -Original Message- From: Koenig, Christian Sent: Wednesday, October 11, 2017 6:40 PM To: Zhou, David(ChunMing) ; Liu, Monk ; Haehnle, Nicolai ; Olsak,

Re: [PATCH] drm/amdgpu: correct reference clock value on vega10

2017-10-11 Thread Alex Deucher
On Wed, Oct 11, 2017 at 4:48 AM, Wang, Ken wrote: > From: Ken Wang > > Change-Id: I377029075af1e2e002f7cfd793ddd58d8610e474 > Signed-off-by: Ken Wang NAK. We use 10khz units for all other asics. We already multiply this by 10 in amdgpu_kms.c before sending it to userspace: /* return a

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
According to the initial summary, this situation is already considered: When vram lost hit, all context marked as guilty, and all jobs in guilty context's KFIFO queue will be kicked out Now if we move the kick out from gpu_reset to run_job, then I think your question can be answered by: in run_

[PATCH][drm-next] drm/amdgpu: make function uvd_v6_0_enc_get_destroy_msg static

2017-10-11 Thread Colin King
From: Colin Ian King The function uvd_v6_0_enc_get_destroy_msg is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'uvd_v6_0_enc_get_destroy_msg' was not declared. Should it be static? Signed-off-by: Colin Ian King --- drivers/gp

Re: [PATCH 050/103] drm/amd/display: Restructuring and cleaning up DML

2017-10-11 Thread Harry Wentland
On 2017-10-11 03:44 AM, Dave Airlie wrote: > On 11 October 2017 at 08:40, Harry Wentland wrote: >> From: Dmytro Laktyushkin >> >> Signed-off-by: Dmytro Laktyushkin >> Reviewed-by: Tony Cheng >> Acked-by: Harry Wentland >> --- >> .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c | 16 + >>

[PATCH] drm/amd/pp: add new sysfs pp_alloc_mem_for_smu

2017-10-11 Thread Rex Zhu
Change-Id: Ie06f87445e7d6945472d88ac976693c98d96cd43 Signed-off-by: Rex Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 134 + 3 files changed, 141 insertions(+

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Christian König
I've already posted a patch for this on the mailing list. Basically we just copy the vram lost counter into the job and when we try to run the job we can mark it as canceled. Regards, Christian. Am 11.10.2017 um 12:14 schrieb Chunming Zhou: Your summary lacks the below issue: How about the

Re: [PATCH][drm-next] drm/amdgpu: make function uvd_v6_0_enc_get_destroy_msg static

2017-10-11 Thread Christian König
Am 11.10.2017 um 11:21 schrieb Colin King: From: Colin Ian King The function uvd_v6_0_enc_get_destroy_msg is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'uvd_v6_0_enc_get_destroy_msg' was not declared. Should it be static? S

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Chunming Zhou
Your summary lacks the below issue: How about the job already pushed in scheduler queue when vram is lost? Regards, David Zhou On 2017年10月11日 17:41, Liu, Monk wrote: Okay, let me summary our whole idea together and see if it works: 1, For cs_submit, always check vram-lost_counter first and re

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
Okay, let me summary our whole idea together and see if it works: 1, For cs_submit, always check vram-lost_counter first and reject the submit (return -ECANCLED to UMD) if ctx->vram_lost_counter != adev->vram_lost_counter. That way the vram lost issue can be handled 2, for cs_submit we still n

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
ML: KMD mark all contexts as guilty is because that way we can unify our IOCTL behavior: e.g. for IOCTL only block “guilty”context , no need to worry about vram-lost-counter anymore, that’s a implementation style. I don’t think it is related with UMD layer, I don't think that this is a good idea

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Nicolai Hähnle
On 11.10.2017 11:18, Liu, Monk wrote: Let's talk it simple, When vram lost hit, what's the action for amdgpu_ctx_query()/AMDGPU_CTX_OP_QUERY_STATE on other contexts (not the one trigger gpu hang) after vram lost ? do you mean we return -ENODEV to UMD ? It should successfully return AMDGPU_CT

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Christian König
[ML] I think context is better than entity, because for example if you only block entity_0 of context and allow entity_N run, that means the dependency between entities are broken (e.g. page table updates in Sdma entity pass but gfx submit in GFX entity blocked, not make sense to me) We’d be

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
Let's talk it simple, When vram lost hit, what's the action for amdgpu_ctx_query()/AMDGPU_CTX_OP_QUERY_STATE on other contexts (not the one trigger gpu hang) after vram lost ? do you mean we return -ENODEV to UMD ? In cs_submit, with vram lost hit, if we don't mark all contexts as "guilty", ho

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Nicolai Hähnle
On 11.10.2017 11:02, Christian König wrote: 1.Kick out all jobs in this “guilty” ctx’s KFIFO queue, and set all their fence status to “*ECANCELED*” Setting ECANCELED should be ok. But I think we should do this when we try to run the jobs and not during GPU reset. [ML] without deep thought an

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Nicolai Hähnle
On 11.10.2017 10:48, Liu, Monk wrote: On "guilty": "guilty" is a term that's used by APIs (e.g. OpenGL), so it's reasonable to use it. However, it /does not/ make sense to mark idle contexts as "guilty" just because VRAM is lost. VRAM lost is a perfect example where the driver should report con

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
On "guilty": "guilty" is a term that's used by APIs (e.g. OpenGL), so it's reasonable to use it. However, it does not make sense to mark idle contexts as "guilty" just because VRAM is lost. VRAM lost is a perfect example where the driver should report context lost to applications with the "inn

[PATCH] drm/amdgpu: correct reference clock value on vega10

2017-10-11 Thread Wang, Ken
From: Ken Wang Change-Id: I377029075af1e2e002f7cfd793ddd58d8610e474 Signed-off-by: Ken Wang --- drivers/gpu/drm/amd/amdgpu/soc15.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 7839677..a5

[PATCH] drm/amdgpu: correct reference clock value on vega10

2017-10-11 Thread Ken.Wang
From: Ken Wang Change-Id: I377029075af1e2e002f7cfd793ddd58d8610e474 Signed-off-by: Ken Wang --- drivers/gpu/drm/amd/amdgpu/soc15.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 7839677..a5

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Haehnle, Nicolai
>From a Mesa perspective, this almost all sounds reasonable to me. On "guilty": "guilty" is a term that's used by APIs (e.g. OpenGL), so it's reasonable to use it. However, it does not make sense to mark idle contexts as "guilty" just because VRAM is lost. VRAM lost is a perfect example where t

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
1.Set its fence error status to “ETIME”, No, as I already explained ETIME is for synchronous operation. In other words when we return ETIME from the wait IOCTL it would mean that the waiting has somehow timed out, but not the job we waited for. Please use ECANCELED as well or some other

Re: [PATCH v2 2/2] drm/amdgpu: Move old fence waiting before reservation lock is aquired.

2017-10-11 Thread Christian König
Am 10.10.2017 um 22:50 schrieb Andrey Grodzovsky: Helps avoiding deadlock during GPU reset. Added mutex to amdgpu_ctx to preserve order of fences on a ring. v2: Put waiting logic in a function in a seperate function in amdgpu_ctx.c Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amd

RE: [PATCH v2 2/2] drm/amdgpu: Move old fence waiting before reservation lock is aquired.

2017-10-11 Thread Liu, Monk
No pthread_mutex cannot retired, because its protection range is not only the IOCTL, there are other struct field need pthread_mutex's protection -Original Message- From: Zhou, David(ChunMing) Sent: Wednesday, October 11, 2017 3:42 PM To: Koenig, Christian ; Liu, Monk ; Grodzovsky, And

Re: [PATCH 050/103] drm/amd/display: Restructuring and cleaning up DML

2017-10-11 Thread Dave Airlie
On 11 October 2017 at 08:40, Harry Wentland wrote: > From: Dmytro Laktyushkin > > Signed-off-by: Dmytro Laktyushkin > Reviewed-by: Tony Cheng > Acked-by: Harry Wentland > --- > .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c | 16 + > drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c |

Re: [PATCH v2 2/2] drm/amdgpu: Move old fence waiting before reservation lock is aquired.

2017-10-11 Thread Chunming Zhou
After ctx mutex is added, pthread_mutext in libdrm can be removed now. David Zhou On 2017年10月11日 15:25, Christian König wrote: Yes, the mutex is mandatory. As I explained before it doesn't matter what userspace is doing, the kernel IOCTL must always be thread safe. Otherwise userspace coul

Re: [PATCH v2 1/2] drm/amdgpu: Refactor amdgpu_cs_ib_vm_chunk and amdgpu_cs_ib_fill.

2017-10-11 Thread Christian König
Am 10.10.2017 um 22:50 schrieb Andrey Grodzovsky: This enables old fence waiting before reservation lock is aquired which in turn is part of a bigger solution to deadlock happening when gpu reset with VRAM recovery accures during intensive rendering. Signed-off-by: Andrey Grodzovsky That look

Re: [PATCH v2 2/2] drm/amdgpu: Move old fence waiting before reservation lock is aquired.

2017-10-11 Thread Christian König
Yes, the mutex is mandatory. As I explained before it doesn't matter what userspace is doing, the kernel IOCTL must always be thread safe. Otherwise userspace could force the kernel to run into a BUG_ON() or worse. Additional to that we already use an CS interface upstream which doesn't have

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Christian König
See inline: Am 11.10.2017 um 07:33 schrieb Liu, Monk: Hi Christian & Nicolai, We need to achieve some agreements on what should MESA/UMD do and what should KMD do, *please give your comments with “okay” or “No” and your idea on below items,* lWhen a job timed out (set from lockup_timeout k

RE: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Liu, Monk
+ david From: Liu, Monk Sent: Wednesday, October 11, 2017 1:34 PM To: Koenig, Christian ; Haehnle, Nicolai ; Olsak, Marek ; Deucher, Alexander Cc: amd-gfx@lists.freedesktop.org; Ding, Pixel ; Jiang, Jerry (SW) ; Li, Bingley (bingley...@amd.com) ; Ramirez, Alejandro ; Filipas, Mario Subject

amd-gfx@lists.freedesktop.org

2017-10-11 Thread Horace Chen
SR-IOV need to exchange some data between PF&VF through shared VRAM PF will copy some necessary firmware and information to the shared VRAM. It also requires some information from VF. PF will send a key through mailbox2 to help guest calculate checksum so that it can verify whether the data is cor