Am 2020-08-28 um 8:38 p.m. schrieb Mukul Joshi:
> Replace spaces with Tabs to fix indentation in kfd_smi_event
> enum.
>
> Signed-off-by: Mukul Joshi
Reviewed-by: Felix Kuehling
> ---
> include/uapi/linux/kfd_ioctl.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --g
On 2020-08-26 10:46, Andrey Grodzovsky wrote:
> DPC recovery involves ASIC reset just as normal GPU recovery so block
> SW GPU shcedulers and wait on all concurent GPU resets.
>
> Signed-off-by: Andrey Grodzovsky
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57
>
On 2020-08-26 10:46, Andrey Grodzovsky wrote:
> Add DPC handlers with basic recovery functionality.
>
> Signed-off-by: Andrey Grodzovsky
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h| 9 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 181
> -
> drivers/gpu
On 2020-08-26 10:46, Andrey Grodzovsky wrote:
> At this point the ASIC is already post reset by the HW/PSP
> so the HW not in proper state to be configured for suspension,
> some bloks might be even gated and so best is to avoid touching it.
"blocks"
>
> Signed-off-by: Andrey Grodzovsky
> ---
>
Replace spaces with Tabs to fix indentation in kfd_smi_event
enum.
Signed-off-by: Mukul Joshi
---
include/uapi/linux/kfd_ioctl.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 8b7368bfbd84..695b606da
Am 2020-08-28 um 1:53 p.m. schrieb Mukul Joshi:
> Add support for reporting GPU reset events through SMI. KFD
> would report both pre and post GPU reset events.
>
> Signed-off-by: Mukul Joshi
Minor coding-style nit-picks inline. With those fixed, this patch is
Reviewed-by: Felix Kuehling
And
On 8/28/20 3:29 PM, Alex Deucher wrote:
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
Wait for HW/PSP initiated ASIC reset to complete before
starting the recovery operations.
v2: Remove typo
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22
On Fri, Aug 28, 2020 at 3:30 PM Alex Deucher wrote:
>
> On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
> wrote:
> >
> > XGMI support is more complicated then single device support as
> > questions synchronization between the device recovering from
> > PCI error and other memebers of the hive
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
>
> XGMI support is more complicated then single device support as
> questions synchronization between the device recovering from
> PCI error and other memebers of the hive is required.
> Leaving this for next round.
>
> Signed-off-by: Andr
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
>
> Reuse exsisting functions from GPU recovery to avoid code
> duplications.
>
> Signed-off-by: Andrey Grodzovsky
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 75
> ++
> 1 file changed, 14 insertions(
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
>
> Wait for HW/PSP initiated ASIC reset to complete before
> starting the recovery operations.
>
> v2: Remove typo
>
> Signed-off-by: Andrey Grodzovsky
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 --
> 1 f
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
>
> DPC recovery involves ASIC reset just as normal GPU recovery so blosk
Typo: "block"
> SW GPU scedulers and wait on all concurent GPU resets.
Typos: "schedulers" and "concurrent"
>
> Signed-off-by: Andrey Grodzovsky
> ---
> drivers
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
>
> At this point the ASIC is already post reset by the HW/PSP
> so the HW not in proper state to be configured for suspension,
> some bloks might be even gated and so best is to avoid touching it.
typo: "blocks"
>
> v2: Rename in_dpc to
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
>
> Add DPC handlers with basic recovery functionality.
>
> v2: remove pci_save_state to avoid breaking suspend/resume
>
> Signed-off-by: Andrey Grodzovsky
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h|
On Fri, Aug 28, 2020 at 3:23 PM Alex Deucher wrote:
>
> On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
> wrote:
> >
> > Add DPC handlers with basic recovery functionality.
> >
> > v2: remove pci_save_state to avoid breaking suspend/resume
> >
> > Signed-off-by: Andrey Grodzovsky
> > ---
> >
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
>
> Add DPC handlers with basic recovery functionality.
>
> v2: remove pci_save_state to avoid breaking suspend/resume
>
> Signed-off-by: Andrey Grodzovsky
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h| 9 ++
> drivers/gpu/drm/amd
On Fri, Aug 28, 2020 at 12:06 PM Andrey Grodzovsky
wrote:
>
> Cache the PCI state on boot and before each case were we might
> loose it.
>
> v2: Add pci_restore_state while caching the PCI state to avoid
> breaking PCI core logic for stuff like suspend/resume.
>
> Signed-off-by: Andrey Grodzovsky
Will be used to fetch the fan speeds when manual fan mode is
set.
v2: squash in a Coverity fix from Colin Ian King
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/inc/smu_v11_0.h| 3 +++
.../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 21 +++
2 files changed, 24
No longer needed as we can calculate it based on
the fan's max rpm.
v2: rework code to avoid possible uninitialized
variable use.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 1 -
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 12 +--
.../gpu/drm/amd
grab the value from the pptable.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 10 ++
drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c| 10 ++
.../gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c| 10 ++
3 files changed
Need to read back from registers for manual mode rather than
using the metrics table.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1164
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 11 ---
drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
To fetch the max rpm from pptable.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 4
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 ++
drivers/gpu/drm/amd/pm/swsmu/smu_internal.h | 1 +
3 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/pm/in
No longer needed as we can calculate it based on
the fan's max rpm.
v2: minor code rework
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 1 -
drivers/gpu/drm/amd/pm/inc/smu_v11_0.h| 3 --
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 9 --
.../g
Add support for reporting GPU reset events through SMI. KFD
would report both pre and post GPU reset events.
Signed-off-by: Mukul Joshi
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 +++
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 ++
drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 35
DPC recovery involves ASIC reset just as normal GPU recovery so blosk
SW GPU scedulers and wait on all concurent GPU resets.
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 +++---
1 file changed, 53 insertions(+), 4 deletions(-)
diff
Cache the PCI state on boot and before each case were we might
loose it.
v2: Add pci_restore_state while caching the PCI state to avoid
breaking PCI core logic for stuff like suspend/resume.
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 6 +++
drivers/gpu/dr
Wait for HW/PSP initiated ASIC reset to complete before
starting the recovery operations.
v2: Remove typo
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 --
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/a
XGMI support is more complicated then single device support as
questions synchronization between the device recovering from
PCI error and other memebers of the hive is required.
Leaving this for next round.
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +
Reuse exsisting functions from GPU recovery to avoid code
duplications.
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 75 ++
1 file changed, 14 insertions(+), 61 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
Many PCI bus controllers are able to detect a variety of hardware PCI errors on
the bus,
such as parity errors on the data and address buses, A typical action taken is
to disconnect
the affected device, halting all I/O to it. Typically, a reconnection mechanism
is also offered,
so that the a
At this point the ASIC is already post reset by the HW/PSP
so the HW not in proper state to be configured for suspension,
some bloks might be even gated and so best is to avoid touching it.
v2: Rename in_dpc to more meaningful name
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu
Add DPC handlers with basic recovery functionality.
v2: remove pci_save_state to avoid breaking suspend/resume
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 9 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 169 -
drivers/gpu/dr
On Fri, Aug 28, 2020 at 10:21 AM Andrey Grodzovsky
wrote:
>
>
> On 8/28/20 10:13 AM, Alex Deucher wrote:
> > On Thu, Aug 27, 2020 at 10:54 AM Andrey Grodzovsky
> > wrote:
> >>
> >> On 8/26/20 11:20 AM, Alex Deucher wrote:
> >>> On Wed, Aug 26, 2020 at 10:46 AM Andrey Grodzovsky
> >>> wrote:
> >>
Am 28.08.20 um 15:38 schrieb Alex Deucher:
On Fri, Aug 28, 2020 at 4:06 AM Christian König
wrote:
Am 24.08.20 um 18:15 schrieb Alex Deucher:
Nothing to do for this family.
Uff, no. Can't we just make the callback optional?
I guess we could, but all of the other asic callbacks are assumed to
On 8/28/20 10:13 AM, Alex Deucher wrote:
On Thu, Aug 27, 2020 at 10:54 AM Andrey Grodzovsky
wrote:
On 8/26/20 11:20 AM, Alex Deucher wrote:
On Wed, Aug 26, 2020 at 10:46 AM Andrey Grodzovsky
wrote:
DPC recovery after prev. DPC recovery or after prev. MODE1 reset fails
unles you save the c
[AMD Official Use Only - Internal Distribution Only]
This code gets called during suspend and resume and GPU reset as well. Are
those cases properly covered?
Alex
From: amd-gfx on behalf of Christian
König
Sent: Friday, August 28, 2020 4:16 AM
To: Das, Nirmo
On Thu, Aug 27, 2020 at 10:54 AM Andrey Grodzovsky
wrote:
>
>
> On 8/26/20 11:20 AM, Alex Deucher wrote:
> > On Wed, Aug 26, 2020 at 10:46 AM Andrey Grodzovsky
> > wrote:
> >> DPC recovery after prev. DPC recovery or after prev. MODE1 reset fails
> >> unles you save the cashe the saved PCI confsp
[AMD Official Use Only - Internal Distribution Only]
>> The patch was to fix the previous inconsistent optimization patch.
If so that's a good reason, I'll take a look , thanks
-邮件原件-
发件人: Das, Nirmoy
发送时间: 2020年8月28日 15:46
收件人: Liu, Monk ; Das, Nirmoy ;
amd-gfx@lists.freedesktop.org
On Fri, Aug 28, 2020 at 4:06 AM Christian König
wrote:
>
> Am 24.08.20 um 18:15 schrieb Alex Deucher:
> > Nothing to do for this family.
>
> Uff, no. Can't we just make the callback optional?
>
I guess we could, but all of the other asic callbacks are assumed to be present.
Alex
> >
> > Signed-
Thanks, applied!
In future patches can you please add a Signed-off line (e.g. use "-s" with
git when forming the commit).
Tom
On Thu, Aug 27, 2020 at 9:43 PM 张二东 wrote:
> Yes, you are right. New patch is in attachment.
>
> thanks.
>
>
>
>
>
>
> 在 2020-08-28 01:14:02,"Tom St Denis" 写道:
>
> isn
On Sun, Aug 23, 2020 at 12:45 PM Sam Ravnborg wrote:
> The first patch trims backlight_update_status() so it can be called with a
> NULL
> backlight_device. Then the caller do not need to add this check just to avoid
> a NULL reference.
>
> The backlight drivers uses several different patterns w
On 8/28/20 10:23 AM, Christian König wrote:
Am 28.08.20 um 10:14 schrieb Samuel Pitoiset:
On 8/28/20 9:57 AM, Christian König wrote:
Am 25.08.20 um 16:07 schrieb Samuel Pitoiset:
A trap handler can be used by userspace to catch shader exceptions
like divide by zero, memory violations etc.
O
Am 28.08.20 um 10:14 schrieb Samuel Pitoiset:
On 8/28/20 9:57 AM, Christian König wrote:
Am 25.08.20 um 16:07 schrieb Samuel Pitoiset:
A trap handler can be used by userspace to catch shader exceptions
like divide by zero, memory violations etc.
On GFX6-GFX8, the registers used to configure T
The explanation sounds sane, but since I don't know the affected code at
all the series is only Acked-by: Christian König
Maybe wait for Alex to give you an rb if you are unsure, otherwise feel
free to commit.
Christian.
Am 27.08.20 um 08:39 schrieb Nirmoy:
Series is Acked-by: Nirmoy Das
On 8/28/20 9:57 AM, Christian König wrote:
Am 25.08.20 um 16:07 schrieb Samuel Pitoiset:
A trap handler can be used by userspace to catch shader exceptions
like divide by zero, memory violations etc.
On GFX6-GFX8, the registers used to configure TBA/TMA aren't
privileged and can be configured
Am 24.08.20 um 18:15 schrieb Alex Deucher:
Nothing to do for this family.
Uff, no. Can't we just make the callback optional?
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/si.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/
Am 25.08.20 um 16:07 schrieb Samuel Pitoiset:
A trap handler can be used by userspace to catch shader exceptions
like divide by zero, memory violations etc.
On GFX6-GFX8, the registers used to configure TBA/TMA aren't
privileged and can be configured from userpace.
On GFX9+ they are per VMID an
On 8/28/20 9:29 AM, Liu, Monk wrote:
[AMD Official Use Only - Internal Distribution Only]
Please don't change those code unless you have a full stress test and a solid
reason (what bug fixed or what new feature introduced )
Otherwise if it's a pure personal refactor or cleanup it will be not
[AMD Official Use Only - Internal Distribution Only]
Per Monk and Emily's suggestion, I will not submit another patch to make
amdgpu_device_ip_reinit_late_sriov()
and amdgpu_device_ip_reinit_early_sriov() consistent.
There's already no logic bug, the little difference in loop layer order does no
Yes, you are right. New patch is in attachment.
thanks.
在 2020-08-28 01:14:02,"Tom St Denis" 写道:
isn't a better fix to simply delete the line? The print seems redundant to me.
Tom
On Thu, Aug 27, 2020 at 9:27 AM 张二东 wrote:
__
[AMD Official Use Only - Internal Distribution Only]
Please don't change those code unless you have a full stress test and a solid
reason (what bug fixed or what new feature introduced )
Otherwise if it's a pure personal refactor or cleanup it will be not necessary
_
On 8/28/20 8:58 AM, Gu, JiaWei (Will) wrote:
[AMD Official Use Only - Internal Distribution Only]
Hi Nirmoy,
I also found amdgpu_device_ip_reinit_late_sriov() part is missed.
Will push another patch to make them consistent soon.
Thanks, Jiawei.
Nirmoy
Best regards,
Jiawei
-Origi
52 matches
Mail list logo