Am 01.09.21 um 02:46 schrieb Monk Liu:
issue:
in cleanup_job the cancle_delayed_work will cancel a TO timer
even the its corresponding job is still running.
fix:
do not cancel the timer in cleanup_job, instead do the cancelling
only when the heading job is signaled, and if there is a "next" job
On 2021-09-01 12:40 a.m., Jingwen Chen wrote:
On Wed Sep 01, 2021 at 12:28:59AM -0400, Andrey Grodzovsky wrote:
On 2021-09-01 12:25 a.m., Jingwen Chen wrote:
On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote:
I will answer everything here -
On 2021-08-31 9:58 p.m., Liu, Monk
On Wed Sep 01, 2021 at 12:28:59AM -0400, Andrey Grodzovsky wrote:
>
> On 2021-09-01 12:25 a.m., Jingwen Chen wrote:
> > On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote:
> > > I will answer everything here -
> > >
> > > On 2021-08-31 9:58 p.m., Liu, Monk wrote:
> > >
> > >
> > >
On 2021-09-01 12:25 a.m., Jingwen Chen wrote:
On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote:
I will answer everything here -
On 2021-08-31 9:58 p.m., Liu, Monk wrote:
[AMD Official Use Only]
In the previous discussion, you guys stated that we should dr
On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote:
> I will answer everything here -
>
> On 2021-08-31 9:58 p.m., Liu, Monk wrote:
>
>
> [AMD Official Use Only]
>
>
>
> In the previous discussion, you guys stated that we should drop the
> “kthread_should_park”
I will answer everything here -
On 2021-08-31 9:58 p.m., Liu, Monk wrote:
[AMD Official Use Only]
In the previous discussion, you guys stated that we should drop the
“kthread_should_park” in cleanup_job.
@@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct
drm_gpu_scheduler *sched)
{
During svm restore pages interrupt handler, kfd_process ref count was
never dropped when xnack was disabled. Therefore, the object was never
released.
Signed-off-by: Alex Sierra
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers
[AMD Official Use Only]
In the previous discussion, you guys stated that we should drop the
"kthread_should_park" in cleanup_job.
@@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
{
struct drm_sched_job *job, *next;
- /*
-* Don't destroy jobs
[AMD Official Use Only]
That' really matter in practice, when two jobs from different process scheduled
to the ring close to each other, if we don't discriminate A from B then B will
be considered a bad job due to A's timeout, which will force B's process to
exit (e.g.: X server)
Thanks
[AMD Official Use Only]
Okay, I will reprepare this patch
Thanks
--
Monk Liu | Cloud-GPU Core team
--
-Original Message-
From: Daniel Vetter
Sent: Tuesday, August 31, 2021 9:02 PM
To: Liu, Monk
Cc: amd-g
[AMD Official Use Only]
Hi Daniel/Christian/Andrey
It looks the voice from you three are spread over those email floods to me, the
feature we are working on (diagnostic TDR scheme) is pending there for more
than 6 month (we started it from feb 2021).
Honestly speaking the email ways that we ar
[AMD Official Use Only]
>> Also why don't we reuse the function drivers already have to stop a
>> scheduler thread? We seem to have two kthread_park now, that's probably one
>> too much.
Are you referring to drm_sched_stop ?
That's different, we don't need the logic from it, see that it go thr
[AMD Official Use Only]
>> This is a __ function, i.e. considered internal, and it's lockless atomic,
>> i.e. unordered. And you're not explaining why this works.
It's not a traditional habit from what I can see that put explain in code, but
we can do that in mails ,
We want to park the schedul
tested-by: jingwen chen
Signed-off-by: Monk Liu
Signed-off-by: jingwen chen
---
drivers/gpu/drm/scheduler/sched_main.c | 24
1 file changed, 4 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
i
issue:
in cleanup_job the cancle_delayed_work will cancel a TO timer
even the its corresponding job is still running.
fix:
do not cancel the timer in cleanup_job, instead do the cancelling
only when the heading job is signaled, and if there is a "next" job
we start_timeout again.
v2:
further clea
What about removing
(kthread_should_park()) ? We decided it's useless as far as I remember.
Andrey
From: amd-gfx on behalf of Liu, Monk
Sent: 31 August 2021 20:24
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Subject: RE:
[AMD Official Use Only]
Ping Christian, Andrey
Can we merge this patch first ? this is a standalone patch for the timer
Thanks
--
Monk Liu | Cloud-GPU Core team
--
-Original Message-
From: Monk Liu
Sent
On 2021-08-31 6:09 p.m., Zeng, Oak wrote:
A nit-pick inline. Otherwise this patch is Reviewed-by: Oak Zeng
Regards,
Oak
On 2021-08-31, 5:57 PM, "amd-gfx on behalf of Felix Kuehling"
wrote:
On some GPUs the PCIe atomic requirement for KFD depends on the MEC
firmware version.
A nit-pick inline. Otherwise this patch is Reviewed-by: Oak Zeng
Regards,
Oak
On 2021-08-31, 5:57 PM, "amd-gfx on behalf of Felix Kuehling"
wrote:
On some GPUs the PCIe atomic requirement for KFD depends on the MEC
firmware version. Add a firmware version check for this. The mi
On some GPUs the PCIe atomic requirement for KFD depends on the MEC
firmware version. Add a firmware version check for this. The minimum
firmware version that works without atomics can be updated in the
device_info structure for each GPU type.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/am
On 2021-08-31 16:56, Andrey Grodzovsky wrote:
> On 2021-08-31 12:01 p.m., Luben Tuikov wrote:
>> On 2021-08-31 11:23, Andrey Grodzovsky wrote:
>>> On 2021-08-31 10:38 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 10:20:40AM -0400, Andrey Grodzovsky wrote:
> On 2021-08-31 10:03 a.m., D
On 2021-08-31 12:01 p.m., Luben Tuikov wrote:
On 2021-08-31 11:23, Andrey Grodzovsky wrote:
On 2021-08-31 10:38 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 10:20:40AM -0400, Andrey Grodzovsky wrote:
On 2021-08-31 10:03 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 09:53:36AM -04
With DRM_USE_DYNAMIC_DEBUG, each callsite record requires 56 bytes.
We can combine 12 into one here and save ~620 bytes.
Signed-off-by: Jim Cromie
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 36 +--
1 file changed, 23 insertions(+), 13 deletions(-)
diff --git a/drivers/g
drm's debug system writes 10 distinct categories of messages to syslog
using a small API[1]: drm_dbg*(10 names), DRM_DEV_DEBUG*(3 names),
DRM_DEBUG*(8 names). There are thousands of these callsites, each
categorized in this systematized way.
These callsites can be enabled at runtime by their cate
There are blocks of DRM_DEBUG calls, consolidate their args into
single calls. With dynamic-debug in use, each callsite consumes 56
bytes of callsite data, and this patch removes about 65 calls, so
it saves ~3.5kb.
no functional changes.
RFC: this creates multi-line log messages, does that break
Duplicate drm_debug_enabled() code into both "basic" and "dyndbg"
ifdef branches. Then add a pr_debug("todo: ...") into the "dyndbg"
branch.
Then convert the "dyndbg" branch's code to a macro, so that its
pr_debug() get its callsite info from the invoking function, instead
of from drm_debug_enabl
The gvt component of this driver has ~120 pr_debugs, in 9 categories
quite similar to those in DRM. Following the interface model of
drm.debug, add a parameter to map bits to these categorizations.
DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
"dyndbg bitmap desc",
{ "gv
logger_types.h defines many DC_LOG_*() categorized debug wrappers.
Most of these use DRM debug API, so are controllable using drm.debug,
but others use bare pr_debug("$prefix: .."), each with a different
class-prefix matching "^\[\w+\]:"
Use DEFINE_DYNAMIC_DEBUG_CATEGORIES to create a /sys debug_d
Taking embedded spaces out of existing prefixes makes them better
class-prefixes; simplifying the extra quoting needed otherwise:
$> echo format "^gvt: core:" +p >control
Dropping the internal spaces means any trailing space in a query will
more clearly terminate the prefix being searched for.
DEFINE_DYNAMIC_DEBUG_CATEGORIES(name, var, bitmap_desc, @bit_descs)
allows users to define a drm.debug style (bitmap) sysfs interface, and
to specify the desired mapping from bits[0-N] to the format-prefix'd
pr_debug()s to be controlled.
DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
Hi Jason, DRM folks,
In DRM-debug currently, drm_debug_enabled() is called a lot to decide
whether or not to write debug messages. Each test is cheap, but costs
continue with uptime. DYNAMIC_DEBUG "dyndbg", when built with
JUMP_LABEL, replaces each of those tests with a patchable NOOP, for
"zero
On 2021-08-31 9:11 a.m., Daniel Vetter wrote:
On Thu, Aug 26, 2021 at 11:04:14AM +0200, Daniel Vetter wrote:
On Thu, Aug 19, 2021 at 11:25:09AM -0400, Andrey Grodzovsky wrote:
On 2021-08-19 5:30 a.m., Daniel Vetter wrote:
On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote:
On
On 2021-08-31 11:23, Andrey Grodzovsky wrote:
> On 2021-08-31 10:38 a.m., Daniel Vetter wrote:
>> On Tue, Aug 31, 2021 at 10:20:40AM -0400, Andrey Grodzovsky wrote:
>>> On 2021-08-31 10:03 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote:
> It's
On 2021-08-31 10:38 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 10:20:40AM -0400, Andrey Grodzovsky wrote:
On 2021-08-31 10:03 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote:
It's says patch [2/2] but i can't find patch 1
On 2021-08-31 6:
On 2021-08-31 08:59, Daniel Vetter wrote:
> Can we please have some actual commit message here, with detailed
> explanation of the race/bug/whatever, how you fix it and why this is the
> best option?
I agree with Daniel--a narrative form of a commit message is so much easier
for humans to digest.
On Tue, Aug 31, 2021 at 10:20:40AM -0400, Andrey Grodzovsky wrote:
>
> On 2021-08-31 10:03 a.m., Daniel Vetter wrote:
> > On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote:
> > > It's says patch [2/2] but i can't find patch 1
> > >
> > > On 2021-08-31 6:35 a.m., Monk Liu wrote:
>
On 2021-08-31 10:03 a.m., Daniel Vetter wrote:
On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote:
It's says patch [2/2] but i can't find patch 1
On 2021-08-31 6:35 a.m., Monk Liu wrote:
tested-by: jingwen chen
Signed-off-by: Monk Liu
Signed-off-by: jingwen chen
---
driv
On Mon, Aug 30, 2021 at 2:24 AM Guchun Chen wrote:
>
> This gurantees no more work on the ring can be submitted
> to hardware in suspend/resume case, otherwise a potential
> race will occur and the ring will get no chance to stay
> empty before suspend.
>
> v2: Call drm_sched_resubmit_job before d
On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote:
> It's says patch [2/2] but i can't find patch 1
>
> On 2021-08-31 6:35 a.m., Monk Liu wrote:
> > tested-by: jingwen chen
> > Signed-off-by: Monk Liu
> > Signed-off-by: jingwen chen
> > ---
> > drivers/gpu/drm/scheduler/sched_
It's says patch [2/2] but i can't find patch 1
On 2021-08-31 6:35 a.m., Monk Liu wrote:
tested-by: jingwen chen
Signed-off-by: Monk Liu
Signed-off-by: jingwen chen
---
drivers/gpu/drm/scheduler/sched_main.c | 24
1 file changed, 4 insertions(+), 20 deletions(-)
di
On Thu, Aug 26, 2021 at 11:04:14AM +0200, Daniel Vetter wrote:
> On Thu, Aug 19, 2021 at 11:25:09AM -0400, Andrey Grodzovsky wrote:
> >
> > On 2021-08-19 5:30 a.m., Daniel Vetter wrote:
> > > On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote:
> > > > On 2021-08-18 10:42 a.m., Danie
On Fri, Aug 27, 2021 at 08:30:32PM +0200, Christian König wrote:
> Yeah, that's what I meant with that the start of processing a job is a bit
> swampy defined.
>
> Jobs overload, but we simply don't have another good indicator that a job
> started except that the previous one completed.
>
> It's
On Mon, Aug 30, 2021 at 07:15:29PM +0300, Skyler Mäntysaari wrote:
> I have tried kernel 5.13.13, without any difference and I haven't
> tried with an older kernel, as this hardware is that new that I have
> very little faith in less than 5.x kernel would even have support for
> the needed GPU.
Ye
On Tue, Aug 31, 2021 at 02:59:02PM +0200, Daniel Vetter wrote:
> Can we please have some actual commit message here, with detailed
> explanation of the race/bug/whatever, how you fix it and why this is the
> best option?
>
> On Tue, Aug 31, 2021 at 06:35:39PM +0800, Monk Liu wrote:
> > tested-by:
Can we please have some actual commit message here, with detailed
explanation of the race/bug/whatever, how you fix it and why this is the
best option?
On Tue, Aug 31, 2021 at 06:35:39PM +0800, Monk Liu wrote:
> tested-by: jingwen chen
> Signed-off-by: Monk Liu
> Signed-off-by: jingwen chen
> -
tested-by: jingwen chen
Signed-off-by: Monk Liu
Signed-off-by: jingwen chen
---
drivers/gpu/drm/scheduler/sched_main.c | 24
1 file changed, 4 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
i
issue:
in cleanup_job the cancle_delayed_work will cancel a TO timer
even the its corresponding job is still running.
fix:
do not cancel the timer in cleanup_job, instead do the cancelling
only when the heading job is signaled, and if there is a "next" job
we start_timeout again.
v2:
further clea
[AMD Official Use Only]
Submitting patch to resolve incorrect register address' on Aldebaran affecting
RAS interrupt handling
0001-drm-amdgpu-Clear-RAS-interrupt-status-on-aldebaran.patch
Description: 0001-drm-amdgpu-Clear-RAS-interrupt-status-on-aldebaran.patch
Am 31.08.21 um 09:08 schrieb Pan, Xinhui:
Fall through to handle the error instead of return.
Fixes: f8aab60422c37 ("drm/amdgpu: Initialise drm_gem_object_funcs for
imported BOs")
Cc: sta...@vger.kernel.org
Signed-off-by: xinhui pan
Reviewed-by: Christian König
---
drivers/gpu/drm/amd/am
Fall through to handle the error instead of return.
Fixes: f8aab60422c37 ("drm/amdgpu: Initialise drm_gem_object_funcs for
imported BOs")
Cc: sta...@vger.kernel.org
Signed-off-by: xinhui pan
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 23 ++-
1 file changed, 10 insertions(+
50 matches
Mail list logo