Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-14 Thread Luben Tuikov
On 2020-08-14 4:10 p.m., Alex Deucher wrote: >> I see DRM as more of a unifying layer (perhaps long term), as opposed >> to a *library* which LLDDs call into. Then LLDDs would provide an interface >> to the hardware. This will help us avoid many of the deadlocks and >> synchronization issues which

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-14 Thread Alex Deucher
On Fri, Aug 14, 2020 at 3:52 PM Luben Tuikov wrote: > > On 2020-08-14 11:23 a.m., Nirmoy wrote: > > > > On 8/13/20 11:17 PM, Luben Tuikov wrote: > >> I support having AER handling. > >> > >> However, I feel it should be offloaded to the DRM layer. > >> The PCI driver gets the AER callback and imme

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-14 Thread Luben Tuikov
On 2020-08-14 11:23 a.m., Nirmoy wrote: > > On 8/13/20 11:17 PM, Luben Tuikov wrote: >> I support having AER handling. >> >> However, I feel it should be offloaded to the DRM layer. >> The PCI driver gets the AER callback and immediately >> offloads into DRM, as "return drm_aer_recover(dev); }". >

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-14 Thread Nirmoy
On 8/13/20 11:17 PM, Luben Tuikov wrote: I support having AER handling. However, I feel it should be offloaded to the DRM layer. The PCI driver gets the AER callback and immediately offloads into DRM, as "return drm_aer_recover(dev); }". The DRM layer does a top-down approach into the error rec

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-13 Thread Luben Tuikov
One more nitpick: > +static pci_ers_result_t amdgpu_pci_err_detected(struct pci_dev *pdev, > + pci_channel_state_t state) > +{ > + struct drm_device *dev = pci_get_drvdata(pdev); That's the name of a state, a state of "pci error detected". I'd rathe

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-13 Thread Luben Tuikov
I support having AER handling. However, I feel it should be offloaded to the DRM layer. The PCI driver gets the AER callback and immediately offloads into DRM, as "return drm_aer_recover(dev); }". The DRM layer does a top-down approach into the error recovery procedure. The PCI device driver prov

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-13 Thread Andrey Grodzovsky
On 8/13/20 11:06 AM, Nirmoy wrote: On 8/13/20 3:38 PM, Andrey Grodzovsky wrote: On 8/13/20 7:09 AM, Nirmoy wrote: On 8/12/20 4:52 PM, Andrey Grodzovsky wrote: On 8/11/20 9:30 AM, Nirmoy Das wrote: This patch will ignore non-fatal errors and try to stop amdgpu's sw stack on fatal errors.

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-13 Thread Nirmoy
On 8/13/20 3:38 PM, Andrey Grodzovsky wrote: On 8/13/20 7:09 AM, Nirmoy wrote: On 8/12/20 4:52 PM, Andrey Grodzovsky wrote: On 8/11/20 9:30 AM, Nirmoy Das wrote: This patch will ignore non-fatal errors and try to stop amdgpu's sw stack on fatal errors. Signed-off-by: Nirmoy Das ---   dri

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-13 Thread Andrey Grodzovsky
On 8/13/20 7:09 AM, Nirmoy wrote: On 8/12/20 4:52 PM, Andrey Grodzovsky wrote: On 8/11/20 9:30 AM, Nirmoy Das wrote: This patch will ignore non-fatal errors and try to stop amdgpu's sw stack on fatal errors. Signed-off-by: Nirmoy Das ---   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 56 +

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-13 Thread Alex Deucher
On Thu, Aug 13, 2020 at 7:06 AM Nirmoy wrote: > > > On 8/12/20 4:52 PM, Andrey Grodzovsky wrote: > > > > On 8/11/20 9:30 AM, Nirmoy Das wrote: > >> This patch will ignore non-fatal errors and try to > >> stop amdgpu's sw stack on fatal errors. > >> > >> Signed-off-by: Nirmoy Das > >> --- > >> d

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-13 Thread Nirmoy
On 8/12/20 4:52 PM, Andrey Grodzovsky wrote: On 8/11/20 9:30 AM, Nirmoy Das wrote: This patch will ignore non-fatal errors and try to stop amdgpu's sw stack on fatal errors. Signed-off-by: Nirmoy Das ---   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 56 -   1 file change

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-12 Thread Andrey Grodzovsky
On 8/11/20 9:30 AM, Nirmoy Das wrote: This patch will ignore non-fatal errors and try to stop amdgpu's sw stack on fatal errors. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 56 - 1 file changed, 54 insertions(+), 2 deletions(-) diff --gi

[RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-11 Thread Nirmoy Das
This patch will ignore non-fatal errors and try to stop amdgpu's sw stack on fatal errors. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 56 - 1 file changed, 54 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c