Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

2019-10-17 Thread Yang, Philip
On 2019-10-17 4:54 a.m., Christian König wrote: > Am 16.10.19 um 18:04 schrieb Jason Gunthorpe: >> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote: >>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe: From: Jason Gunthorpe 8 of the mmu_notifier using drivers (i915_gem

Re: [PATCH] drm/amdgpu: disable c-states on xgmi perfmons

2019-10-17 Thread Yang, Philip
I got compiler warnings after update this morning, because the variables are not initialized in df_v3_6_set_df_cstate() return failed path. CC [M] drivers/gpu/drm/amd/amdgpu/gmc_v9_0.o CC [M] drivers/gpu/drm/amd/amdgpu/gfxhub_v1_1.o /home/yangp/git/compute_staging/kernel/drivers/gpu/drm/am

Re: [PATCH] drm/amdgpu: fix compiler warnings for df perfmons

2019-10-17 Thread Yang, Philip
Reviewed-by: Philip Yang On 2019-10-17 1:56 p.m., Kim, Jonathan wrote: > fixing compiler warnings in df v3.6 for c-state toggle and pmc count. > > Change-Id: I74f8f1eafccf523a89d60d005e3549235f75c6b8 > Signed-off-by: Jonathan Kim > --- > drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 4 ++-- > 1 fil

[PATCH] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Yang, Philip
If device is locked for suspend and resume, kfd open should return failed -EAGAIN without creating process, otherwise the application exit to release the process will hang to wait for resume is done if the suspend and resume is stuck somewhere. This is backtrace: [Thu Oct 17 16:43:37 2019] INFO: t

Re: [PATCH] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Yang, Philip
On 2019-10-18 11:40 a.m., Kuehling, Felix wrote: > On 2019-10-18 10:27 a.m., Yang, Philip wrote: >> If device is locked for suspend and resume, kfd open should return >> failed -EAGAIN without creating process, otherwise the application exit >> to release the process will ha

[PATCH v2] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Yang, Philip
If device is locked for suspend and resume, kfd open should return failed -EAGAIN without creating process, otherwise the application exit to release the process will hang to wait for resume is done if the suspend and resume is stuck somewhere. This is backtrace: v2: fix processes that were create

[PATCH] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-21 Thread Yang, Philip
If device reset/suspend/resume failed for some reason, dqm lock is hold forever and this causes deadlock. Below is a kernel backtrace when application open kfd after suspend/resume failed. Instead of holding dqm lock in pre_reset and releasing dqm lock in post_reset, add dqm->device_stopped flag w

Re: [PATCH] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-22 Thread Yang, Philip
On 2019-10-21 9:03 p.m., Kuehling, Felix wrote: > > On 2019-10-21 5:04 p.m., Yang, Philip wrote: >> If device reset/suspend/resume failed for some reason, dqm lock is >> hold forever and this causes deadlock. Below is a kernel backtrace when >> application open kfd aft

[PATCH v2] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-22 Thread Yang, Philip
If device reset/suspend/resume failed for some reason, dqm lock is hold forever and this causes deadlock. Below is a kernel backtrace when application open kfd after suspend/resume failed. Instead of holding dqm lock in pre_reset and releasing dqm lock in post_reset, add dqm->device_stopped flag w

Re: [PATCH v2] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-22 Thread Yang, Philip
On 2019-10-22 2:40 p.m., Grodzovsky, Andrey wrote: > > On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote: >> On 10/22/19 2:28 PM, Yang, Philip wrote: >>> If device reset/suspend/resume failed for some reason, dqm lock is >>> hold forever and this causes deadlock. Be

Re: [PATCH v2] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-22 Thread Yang, Philip
On 2019-10-22 2:44 p.m., Kuehling, Felix wrote: > On 2019-10-22 14:28, Yang, Philip wrote: >> If device reset/suspend/resume failed for some reason, dqm lock is >> hold forever and this causes deadlock. Below is a kernel backtrace when >> application open kfd after

Re: [PATCH v2] drm/amdkfd: don't use dqm lock during device reset/suspend/resume

2019-10-22 Thread Yang, Philip
On 2019-10-22 3:36 p.m., Grodzovsky, Andrey wrote: > > On 10/22/19 3:19 PM, Yang, Philip wrote: >> >> On 2019-10-22 2:40 p.m., Grodzovsky, Andrey wrote: >>> On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote: >>>> On 10/22/19 2:28 PM, Yang, Philip wrote: >

Re: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

2019-10-29 Thread Yang, Philip
Hi Jason, I did quick test after merging amd-staging-drm-next with the mmu_notifier branch, which includes this set changes. The test result has different failures, app stuck intermittently, GUI no display etc. I am understanding the changes and will try to figure out the cause. Regards, Phili

Re: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

2019-11-01 Thread Yang, Philip
On 2019-10-29 3:25 p.m., Jason Gunthorpe wrote: > On Tue, Oct 29, 2019 at 07:22:37PM +0000, Yang, Philip wrote: >> Hi Jason, >> >> I did quick test after merging amd-staging-drm-next with the >> mmu_notifier branch, which includes this set changes. The test result >

Re: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

2019-11-01 Thread Yang, Philip
On 2019-11-01 11:12 a.m., Jason Gunthorpe wrote: > On Fri, Nov 01, 2019 at 02:44:51PM +0000, Yang, Philip wrote: >> >> >> On 2019-10-29 3:25 p.m., Jason Gunthorpe wrote: >>> On Tue, Oct 29, 2019 at 07:22:37PM +, Yang, Philip wrote: >>>> Hi Jason, &g

Re: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

2019-11-01 Thread Yang, Philip
On 2019-11-01 1:42 p.m., Jason Gunthorpe wrote: > On Fri, Nov 01, 2019 at 03:59:26PM +0000, Yang, Philip wrote: >>> This test for range_blockable should be before mutex_lock, I can move >>> it up >>> >> yes, thanks. > > Okay, I wrote it like this: &

Re: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

2019-11-01 Thread Yang, Philip
Sorry, resend patch, the one in previous email missed couple of lines duo to copy/paste. On 2019-11-01 3:45 p.m., Yang, Philip wrote: > > > On 2019-11-01 1:42 p.m., Jason Gunthorpe wrote: >> On Fri, Nov 01, 2019 at 03:59:26PM +0000, Yang, Philip wrote: >>>> This test

[PATCH 1/2] drm/amdgpu: use HMM mirror callback to replace mmu notifier v5

2018-10-16 Thread Yang, Philip
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. This depends on several HMM patchset from Jérôme Glisse queued for upstream.

[PATCH 2/2] drm/amdgpu: replace get_user_pages with HMM address mirror helpers

2018-10-16 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

Re: [PATCH 2/2] drm/amdgpu: replace get_user_pages with HMM address mirror helpers

2018-10-17 Thread Yang, Philip
On 2018-10-17 04:15 AM, Christian König wrote: > Am 17.10.18 um 04:56 schrieb Yang, Philip: >> Use HMM helper function hmm_vma_fault() to get physical pages backing >> userptr and start CPU page table update track of those pages. Then use >> hmm_vma_range_done() to check if th

[PATCH 2/2] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v2

2018-10-18 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

[PATCH] drm/amdgpu: fix sdma v4 ring is disabled accidently

2018-10-19 Thread Yang, Philip
For sdma v4, there is bug caused by commit d4e869b6b5d6 ("drm/amdgpu: add ring test for page queue")' local variable ring is reused and changed, so amdgpu_ttm_set_buffer_funcs_status(adev, true) is skipped accidently. As a result, amdgpu_fill_buffer() will fail, kernel message: [drm:amdgpu_fill

[PATCH] drm/amdgpu: use module parameter noretry for gfx and kfd

2018-10-19 Thread Yang, Philip
Currently kfd uses noretry module parameter but gfx uses hardcode setting. Change both to use same noretry module parameter. Set default value to 1, to disable retry for better performance. Export noretry value to kfd userspace runtime through topology. The permission is 0644, means it can be ch

Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-07 Thread Yang, Philip
On 2018-11-07 12:53 p.m., Kuehling, Felix wrote: > [+Philip] > > On 2018-11-07 12:25 a.m., Zhang, Jerry(Junwei) wrote: >> On 11/7/18 1:15 PM, Trigger Huang wrote: >>> Currently, SDMA page queue is not used under SR-IOV VF, and this >>> queue will >>> cause ring test failure in amdgpu module reload

[PATCH] drm/amdgpu: fix bug with IH ring setup

2018-11-12 Thread Yang, Philip
The bug limits the IH ring wptr address to 40bit. When the system memory is bigger than 1TB, the bus address is more than 40bit, this causes the interrupt cannot be handled and cleared correctly. Change-Id: I3cd1b8ad046b38945372f2fd1a2d225624893e28 Signed-off-by: Philip Yang --- drivers/gpu/drm/

[PATCH] drm/amdgpu: enable paging queue doorbell support

2018-11-15 Thread Yang, Philip
paging queues doorbell index use existing assignment sDMA_HI_PRI_ENGINE0/1 index, and increase SDMA_DOORBELL_RANGE size from 2 dwords to 4 dwords to enable the new doorbell index. Change-Id: I9adb965f16ee4089d261d9a22231337739184e49 Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/nbio_

Re: [PATCH] drm/amdgpu: enable paging queue doorbell support

2018-11-15 Thread Yang, Philip
On 2018-11-15 11:43 a.m., Alex Deucher wrote: > On Thu, Nov 15, 2018 at 11:08 AM Yang, Philip wrote: >> paging queues doorbell index use existing assignment sDMA_HI_PRI_ENGINE0/1 >> index, and increase SDMA_DOORBELL_RANGE size from 2 dwords to 4 dwords to >> enable t

Re: [PATCH] drm/amdgpu: enable paging queue doorbell support

2018-11-15 Thread Yang, Philip
t;Felix > > -Original Message----- > From: amd-gfx On Behalf Of Yang, > Philip > Sent: Thursday, November 15, 2018 12:54 PM > To: Alex Deucher > Cc: amd-gfx list > Subject: Re: [PATCH] drm/amdgpu: enable paging queue doorbell support > > On 2018-11-15 11:43

[PATCH] drm/amdgpu: enable paging queue doorbell support v2

2018-11-15 Thread Yang, Philip
paging queues doorbell index use existing assignment sDMA_HI_PRI_ENGINE0/1 index, and increase SDMA_DOORBELL_RANGE size from 2 dwords to 4 dwords to enable the new doorbell index. v2: disable paging queue doorbell on Vega10 and Vega12 with SRIOV Change-Id: I9adb965f16ee4089d261d9a22231337739184e4

[PATCH] drm/amdgpu: enable paging queue doorbell support v3

2018-11-16 Thread Yang, Philip
Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging queue will break SRIOV, instead we can reserve and map two doorbell pages for amdgpu, paging queues doorbell index use same index as SDMA gfx queues index but on second page. For Vega20, after we change doorbell layout to

Re: [PATCH] drm/amdgpu: enable paging queue doorbell support v3

2018-11-16 Thread Yang, Philip
On 2018-11-16 4:35 p.m., Alex Deucher wrote: > On Fri, Nov 16, 2018 at 2:08 PM Yang, Philip wrote: >> Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging >> queue will >> break SRIOV, instead we can reserve and map two doorbell pages for amdgpu, >>

[PATCH 3/3] drm/amdgpu: enable paging queue based on FW version

2018-11-19 Thread Yang, Philip
Based SDMA fw version to enable has_page_queue support. Have to move sdma_v4_0_init_microcode from sw_init to early_init, to load firmware and init fw_version before set_ring/buffer/vm_pte_funcs use it. Change-Id: Ife5d4659d28bc2a7012b48947b27e929749d87c1 Signed-off-by: Philip Yang --- drivers/g

[PATCH 2/3] drm/amdgpu: enable paging queue doorbell support v4

2018-11-19 Thread Yang, Philip
Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging queue will break SRIOV, instead we can reserve and map two doorbell pages for amdgpu, paging queues doorbell index use same index as SDMA gfx queues index but on second page. For Vega20, after we change doorbell layout to

[PATCH 1/3] drm/amdgpu: fix typo in function sdma_v4_0_page_resume

2018-11-19 Thread Yang, Philip
This looks like copy paste typo Change-Id: Iee3fd3a551650ec9199bc030a7886e92000b02e7 Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgp

Re: [PATCH 3/3] drm/amdgpu: enable paging queue based on FW version

2018-11-19 Thread Yang, Philip
On 2018-11-19 3:57 p.m., Deucher, Alexander wrote: >> -Original Message- >> From: amd-gfx On Behalf Of >> Yang, Philip >> Sent: Monday, November 19, 2018 3:20 PM >> To: amd-gfx@lists.freedesktop.org >> Cc: Yang, Philip >> Subject: [PATCH 3/3] d

Re: [PATCH] drm/amdgpu: disable page queue support for Vega12

2018-11-21 Thread Yang, Philip
Reviewed-by: Philip Yang On 2018-11-21 9:54 a.m., Alex Deucher wrote: > Keep it disabled until we confirm it's ready. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/a

[PATCH 1/2] drm/amdgpu: use HMM mirror callback to replace mmu notifier v5

2018-12-03 Thread Yang, Philip
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. The depdent HMM patchsets from Jérôme Glisse are all merged into 4.20.0 kern

[PATCH 2/2] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v2

2018-12-03 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

Re: [PATCH 1/2] drm/amdgpu: use HMM mirror callback to replace mmu notifier v5

2018-12-04 Thread Yang, Philip
On 2018-12-04 4:06 a.m., Christian König wrote: > Am 03.12.18 um 21:19 schrieb Yang, Philip: >> Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables >> callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in >> DRM_AMDGPU_USERPTR Kconfig. >&

Re: [PATCH 2/2] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v2

2018-12-06 Thread Yang, Philip
On 2018-12-03 8:52 p.m., Kuehling, Felix wrote: > See comments inline. I didn't review the amdgpu_cs and amdgpu_gem parts > as I don't know them very well. > > On 2018-12-03 3:19 p.m., Yang, Philip wrote: >> Use HMM helper function hmm_vma_fault() to get physical pages

[PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v6

2018-12-06 Thread Yang, Philip
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. The depdent HMM patchset from Jérôme Glisse are all merged into 4.20.0 kerne

[PATCH 2/3] drm/amdkfd: avoid HMM change cause circular lock dependency

2018-12-06 Thread Yang, Philip
There is circular lock between gfx and kfd path with HMM change: lock(dqm) -> bo::reserve -> amdgpu_mn_lock To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested locking between mmap_sem and bo::reserve. The locking order is: bo::reserve -> amdgpu_mn_lock(p->mn) Change-Id: I2ec0

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v3

2018-12-06 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

Re: [PATCH] drm/amdgpu: bypass RLC init under sriov for Tonga

2018-12-10 Thread Yang, Philip
I got this compilation error message after I rebase this morning, do I miss anything? /home/yangp/git/compute_staging/kernel/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c: In function ‘gfx_v8_0_rlc_resume’: /home/yangp/git/compute_staging/kernel/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:4071:6: error: impl

Re: [PATCH] drm/amdgpu: bypass RLC init under sriov for Tonga

2018-12-10 Thread Yang, Philip
Never mind, just saw the patch to fix the typo. On 2018-12-10 1:07 p.m., Yang, Philip wrote: I got this compilation error message after I rebase this morning, do I miss anything? /home/yangp/git/compute_staging/kernel/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c: In function ‘gfx_v8_0_rlc_resume

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v3

2018-12-13 Thread Yang, Philip
On 2018-12-10 7:12 p.m., Kuehling, Felix wrote: > This is a nice improvement from the last version. I still see some > potential problems. See inline ... > > I'm skipping over the CS and GEM parts. I hope Christian can review > those parts. > > On 2018-12-06 4:02 p.m.,

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v4

2018-12-13 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

Re: [PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v6

2018-12-13 Thread Yang, Philip
c/h to amdgpu_hmm.c/h. > > -David > >> -Original Message- >> From: amd-gfx On Behalf Of Yang, >> Philip >> Sent: Friday, December 07, 2018 5:03 AM >> To: amd-gfx@lists.freedesktop.org >> Cc: Yang, Philip >> Subject: [PATCH 1/3] drm/amdgpu:

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v4

2018-12-14 Thread Yang, Philip
re robust. > > Christian, would you review the CS and GEM parts? And maybe take a look > you see nothing wrong with the amdgpu_ttm changes either. > > On 2018-12-13 4:01 p.m., Yang, Philip wrote: >> Use HMM helper function hmm_vma_fault() to get physical pages backing >> use

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v5

2018-12-14 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

Re: [PATCH 1/2] drm/amdgpu: Add per device sdma_doorbell_range field

2018-12-18 Thread Yang, Philip
On 2018-12-17 9:12 p.m., Zeng, Oak wrote: > Different ASIC has different sdma doorbell range. Add > a per device sdma_doorbell_range field and initialize > it. > > Change-Id: Idd980db1a72cfb373e24ac23ba3e48bb329ed4ad > Signed-off-by: Oak Zeng > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell

Re: [PATCH 1/2] drm/amdgpu: Add per device sdma_doorbell_range field

2018-12-18 Thread Yang, Philip
irendra > Sent: Tuesday, December 18, 2018 11:34 AM > To: Zeng, Oak ; Yang, Philip ; > amd-gfx@lists.freedesktop.org; Deng, Emily > Subject: RE: [PATCH 1/2] drm/amdgpu: Add per device sdma_doorbell_range field > > Hi Oak, > > Windows will set 4 dwords for both sdma0 and

Re: [PATCH 2/2] drm/amdgpu: Fix sdma doorbell range setting

2018-12-18 Thread Yang, Philip
The series is Reviewed-by: Philip Yang On 2018-12-17 9:12 p.m., Zeng, Oak wrote: > Different ASIC has different SDMA queues so different > SDMA doorbell range. Introduce an extra parameter > to sdma_doorbell_range function and set sdma doorbell > range correctly. > > Change-Id: I9b8d75b04f5a47ef

Re: [PATCH 2/2] drm/amdgpu/nbio7.4: add hw bug workaround for vega20

2018-12-20 Thread Yang, Philip
This series are verified on 4 Vega20 by: Philip Yang On 2018-12-20 10:39 a.m., Kasiviswanathan, Harish wrote: > This patch set Reviewed-by: Harish Kasiviswanathan > > > On 2018-12-19 6:09 p.m., Alex Deucher wrote: >> Configure PCIE_CI_CNTL to work around a hw bug that affects >> some multi-GPU

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v5

2019-01-02 Thread Yang, Philip
ies is Reviewed-by: Felix > Kuehling > > Regards, >   Felix > > On 2018-12-14 4:10 p.m., Yang, Philip wrote: >> Use HMM helper function hmm_vma_fault() to get physical pages backing >> userptr and start CPU page table update track of those pages. Then use >>

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v5

2019-01-07 Thread Yang, Philip
On 2019-01-07 9:21 a.m., Christian König wrote: > Am 14.12.18 um 22:10 schrieb Yang, Philip: >> Use HMM helper function hmm_vma_fault() to get physical pages backing >> userptr and start CPU page table update track of those pages. Then use >> hmm_vma_range_done() to che

[PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v6

2019-01-10 Thread Yang, Philip
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. The depdent HMM patchset from Jérôme Glisse are all merged into 4.20.0 kerne

[PATCH 2/3] drm/amdkfd: avoid HMM change cause circular lock dependency v2

2019-01-10 Thread Yang, Philip
There is circular lock between gfx and kfd path with HMM change: lock(dqm) -> bo::reserve -> amdgpu_mn_lock To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested locking between mmap_sem and bo::reserve. The locking order is: bo::reserve -> amdgpu_mn_lock(p->mn) Change-Id: I2ec0

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

2019-01-10 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

2019-01-14 Thread Yang, Philip
Ping Christian, any comments for the GEM and CS part changes? Thanks. Philip On 2019-01-10 12:02 p.m., Yang, Philip wrote: > Use HMM helper function hmm_vma_fault() to get physical pages backing > userptr and start CPU page table update track of those pages. Then use > hmm_vma_range_

Re: Yet another RX Vega hang with another kernel panic signature. WARNING: inconsistent lock state

2019-01-31 Thread Yang, Philip
I found same issue while debugging, I will submit patch to fix this shortly. Philip On 2019-01-30 10:35 p.m., Mikhail Gavrilov wrote: > Hi folks. > Yet another kernel panic happens while GPU again is hang: > > [ 1469.906798] > [ 1469.906799] WARNING: inconsistent

[PATCH] drm/amdgpu: use spin_lock_irqsave to protect vm_manager.pasid_idr

2019-01-31 Thread Yang, Philip
amdgpu_vm_get_task_info is called from interrupt handler and sched timeout workqueue, so it is needed to use irq version spin_lock to avoid deadlock. Change-Id: Ifedd4b97535bf0b5d3936edd2d9688957020efd4 --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 delet

[PATCH 0/3] Use HMM to replace get_user_pages

2019-02-04 Thread Yang, Philip
Hi Christian, This patch is rebased to lastest HMM. Please review the GEM and CS part changes in patch 3/3. Regards, Philip Yang (3): drm/amdgpu: use HMM mirror callback to replace mmu notifier v6 drm/amdkfd: avoid HMM change cause circular lock dependency v2 drm/amdgpu: replace get_user_p

[PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v6

2019-02-04 Thread Yang, Philip
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. The depdent HMM patchset from Jérôme Glisse are all merged into 4.20.0 kerne

[PATCH 2/3] drm/amdkfd: avoid HMM change cause circular lock dependency v2

2019-02-04 Thread Yang, Philip
There is circular lock between gfx and kfd path with HMM change: lock(dqm) -> bo::reserve -> amdgpu_mn_lock To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested locking between mmap_sem and bo::reserve. The locking order is: bo::reserve -> amdgpu_mn_lock(p->mn) Change-Id: I2ec0

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

2019-02-04 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

Re: [PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v6

2019-02-04 Thread Yang, Philip
On 2019-02-04 10:18 a.m., Christian König wrote: > Am 04.02.19 um 16:06 schrieb Yang, Philip: >> Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables >> callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in >> DRM_AMDGPU_USERPTR Kconfig. &

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

2019-02-04 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

[PATCH 2/3] drm/amdkfd: avoid HMM change cause circular lock dependency v2

2019-02-04 Thread Yang, Philip
There is circular lock between gfx and kfd path with HMM change: lock(dqm) -> bo::reserve -> amdgpu_mn_lock To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested locking between mmap_sem and bo::reserve. The locking order is: bo::reserve -> amdgpu_mn_lock(p->mn) Change-Id: I2ec0

[PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v7

2019-02-04 Thread Yang, Philip
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e Signed-off-by: Philip Y

[PATCH 0/3] Use HMM to replace get_user_pages

2019-02-04 Thread Yang, Philip
Hi Christian, This patch is rebased to lastest HMM. Please review the GEM and CS part changes in patch 3/3. Thanks, Philip Yang (3): drm/amdgpu: use HMM mirror callback to replace mmu notifier v7 drm/amdkfd: avoid HMM change cause circular lock dependency v2 drm/amdgpu: replace get_user_pa

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

2019-02-05 Thread Yang, Philip
Hi Christian, My comments are embedded below. I will submit another patch to address those. Thanks, Philip On 2019-02-05 6:52 a.m., Christian König wrote: > Am 04.02.19 um 19:23 schrieb Yang, Philip: >> Use HMM helper function hmm_vma_fault() to get physical pages backing >> us

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

2019-02-05 Thread Yang, Philip
Hi Christian, I will submit new patch for review, my comments embedded inline below. Thanks, Philip On 2019-02-05 1:09 p.m., Koenig, Christian wrote: > Am 05.02.19 um 18:25 schrieb Yang, Philip: >> [SNIP]+ >>>> +    if (r == -ERESTARTSYS) { >&

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v7

2019-02-05 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v7

2019-02-06 Thread Yang, Philip
ries might not be sufficient any more. > Yes, it looks better to handle retry from user space. The extra sys call overhead can be ignored because this does not happen all the time. I will submit new patch for review. Thanks, Philip On 2019-02-06 4:20 a.m., Christian König wrote: > Am 0

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v8

2019-02-06 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are upda

[PATCH 0/3] Use HMM to replace get_user_pages

2019-02-06 Thread Yang, Philip
Hi Christian, Resend patch 1/3, 2/3, added Reviewed-by in comments. Change in patch 3/3, amdgpu_cs_submit, amdgpu_cs_ioctl return -EAGAIN to user space to retry cs_ioctl. Regards, Philip Philip Yang (3): drm/amdgpu: use HMM mirror callback to replace mmu notifier v7 drm/amdkfd: avoid HMM ch

[PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v7

2019-02-06 Thread Yang, Philip
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e Signed-off-by: Philip Y

[PATCH 2/3] drm/amdkfd: avoid HMM change cause circular lock dependency v2

2019-02-06 Thread Yang, Philip
There is circular lock between gfx and kfd path with HMM change: lock(dqm) -> bo::reserve -> amdgpu_mn_lock To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested locking between mmap_sem and bo::reserve. The locking order is: bo::reserve -> amdgpu_mn_lock(p->mn) Change-Id: I2ec0

[PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Yang, Philip
Those options are needed to support HMM Change-Id: Ieb7bb3bcec07245d79a02793e6728228decc400a Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 960a

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-21 Thread Yang, Philip
Thanks Jerome for the the correct HMM config option, only select HMM_MIRROR is not good enough because CONFIG_HMM option maybe missing, add depends on ARCH_HAS_HMM will solve the issue. I will submit new patch to fix the compilation error if HMM_MIRROR config is missing and the HMM config depen

[PATCH] drm/amdgpu: fix HMM config dependency issue

2019-02-21 Thread Yang, Philip
Only select HMM_MIRROR will get kernel config dependency warnings if CONFIG_HMM is missing in the config. Add depends on HMM will solve the issue. Add conditional compilation to fix compilation errors if HMM_MIRROR is not enabled as HMM config is not enabled. Change-Id: I1b44a0b5285bbef5e98bfb045

Re: KASAN caught amdgpu / HMM use-after-free

2019-02-27 Thread Yang, Philip
Hi Michel, Yes, I found the same issue and the bug has been fixed by Jerome: 876b462120aa mm/hmm: use reference counting for HMM struct The fix is on hmm-for-5.1 branch, I cherry-pick it into my local branch to workaround the issue. Regards, Philip On 2019-02-27 12:02 p.m., Michel Dänzer wrot

Re: KASAN caught amdgpu / HMM use-after-free

2019-02-27 Thread Yang, Philip
amd-staging-drm-next will rebase to kernel 5.1 to pickup this fix automatically. As a short-term workaround, please cherry-pick this fix into your local repository. Regards, Philip On 2019-02-27 12:33 p.m., Michel Dänzer wrote: > On 2019-02-27 6:14 p.m., Yang, Philip wrote: >>

Re: KASAN caught amdgpu / HMM use-after-free

2019-02-27 Thread Yang, Philip
. > > Alex > > *From:* amd-gfx on behalf of > Yang, Philip > *Sent:* Wednesday, February 27, 2019 1:05 PM > *To:* Michel Dänzer; Jérôme Glisse > *Cc:* linux...@kvack.org; amd-gfx@lists.freedesktop.org > *Subject:* Re: KASAN caught amdgpu / HMM use-after-f

Re: KASAN caught amdgpu / HMM use-after-free

2019-02-28 Thread Yang, Philip
: > > [ Dropping Jérôme and the linux-mm list ] > > On 2019-02-27 7:48 p.m., Yang, Philip wrote: >> Hi Alex, >> >> Pushed, thanks. >> >> mm/hmm: use reference counting for HMM struct > > Thanks, but I'm not seeing it yet. Maybe it needs some sp

[PATCH] drm/amdgpu: handle userptr corner cases with HMM path

2019-03-01 Thread Yang, Philip
Those corner cases are found by kfdtest.KFDIPCTest. userptr may cross two vmas if the forked child process (not call exec after fork) malloc buffer, then free it, and then malloc larger size buf, kerenl will create new vma adjacent to old vma which was cloned from parent process, some pages of use

Re: [PATCH] drm/amdgpu: handle userptr corner cases with HMM path

2019-03-04 Thread Yang, Philip
han one VMA, fail > 2. Loop over all the VMAs in the address range > > Thanks, >Felix > > -----Original Message- > From: amd-gfx On Behalf Of Yang, > Philip > Sent: Friday, March 01, 2019 12:30 PM > To: amd-gfx@lists.freedesktop.org > Cc: Yang, Ph

[PATCH 3/3] drm/amdgpu: more descriptive message if HMM not enabled

2019-03-05 Thread Yang, Philip
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected, so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver error message "Failed to register MMU notifier" is not clear. Inform user with more descriptive message on how to fix the missing kernel config option. Bugzil

[PATCH 2/3] drm/amdgpu: support userptr cross VMAs case with HMM

2019-03-05 Thread Yang, Philip
userptr may cross two VMAs if the forked child process (not call exec after fork) malloc buffer, then free it, and then malloc larger size buf, kerenl will create new VMA adjacent to old VMA which was cloned from parent process, some pages of userptr are in the first VMA, the rest pages are in the

[PATCH 1/3] drm/amdkfd: support concurrent userptr update for HMM

2019-03-05 Thread Yang, Philip
Userptr restore may have concurrent userptr invalidation after hmm_vma_fault adds the range to the hmm->ranges list, needs call hmm_vma_range_done to remove the range from hmm->ranges list first, then reschedule the restore worker. Otherwise hmm_vma_fault will add same range to the list, this will

[PATCH 0/3] handle userptr corner cases with HMM path

2019-03-05 Thread Yang, Philip
Those corner cases are found by kfdtest.KFDIPCTest. Philip Yang (3): drm/amdkfd: support concurrent userptr update for HMM drm/amdgpu: support userptr cross VMAs case with HMM drm/amdgpu: more descriptive message if HMM not enabled .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 28 +++-

Re: [PATCH 3/3] drm/amdgpu: more descriptive message if HMM not enabled

2019-03-06 Thread Yang, Philip
On 2019-03-06 4:05 a.m., Michel Dänzer wrote: > On 2019-03-05 7:09 p.m., Yang, Philip wrote: >> If using old kernel config file, CONFIG_ZONE_DEVICE is not selected, >> so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver >> error message "Failed to reg

[PATCH 3/3] drm/amdgpu: more descriptive message if HMM not enabled v2

2019-03-06 Thread Yang, Philip
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected, so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver error message "Failed to register MMU notifier" is not clear. Inform user with more descriptive message on how to fix the missing kernel config option. Bugzil

Re: [PATCH 3/3] drm/amdgpu: more descriptive message if HMM not enabled v2

2019-03-06 Thread Yang, Philip
On 2019-03-06 10:04 a.m., Christian König wrote: > Am 06.03.19 um 16:02 schrieb Yang, Philip: >> If using old kernel config file, CONFIG_ZONE_DEVICE is not selected, >> so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver >> error message "Failed to

[PATCH 3/3] drm/amdgpu: more descriptive message if HMM not enabled v3

2019-03-06 Thread Yang, Philip
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected, so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver error message "Failed to register MMU notifier" is not clear. Inform user with more descriptive message on how to fix the missing kernel config option. Bugzil

Re: [PATCH 1/3] drm/amdkfd: support concurrent userptr update for HMM

2019-03-06 Thread Yang, Philip
at that point needs to be untracked. > > For now as a quick fix for an urgent bug, this change is Reviewed-by: > Felix Kuehling . But please revisit this and > check if there are similar corner cases as I explained above. > > Regards, >   Felix > > On 3/5/2019 1:09 PM, Yang,

Re: [PATCH 2/3] drm/amdgpu: support userptr cross VMAs case with HMM

2019-03-06 Thread Yang, Philip
I will submit v2 to fix those issues. Some comments inline... On 2019-03-06 3:11 p.m., Kuehling, Felix wrote: > Some comments inline ... > > On 3/5/2019 1:09 PM, Yang, Philip wrote: >> userptr may cross two VMAs if the forked child process (not call exec >> after fork) mal

[PATCH 2/3] drm/amdgpu: support userptr cross VMAs case with HMM v2

2019-03-06 Thread Yang, Philip
userptr may cross two VMAs if the forked child process (not call exec after fork) malloc buffer, then free it, and then malloc larger size buf, kerenl will create new VMA adjacent to old VMA which was cloned from parent process, some pages of userptr are in the first VMA, the rest pages are in the

[PATCH 1/3] drm/amdkfd: support concurrent userptr update for HMM v2

2019-03-06 Thread Yang, Philip
Userptr restore may have concurrent userptr invalidation after hmm_vma_fault adds the range to the hmm->ranges list, needs call hmm_vma_range_done to remove the range from hmm->ranges list first, then reschedule the restore worker. Otherwise hmm_vma_fault will add same range to the list, this will

  1   2   >