On 2019-10-17 4:54 a.m., Christian König wrote:
> Am 16.10.19 um 18:04 schrieb Jason Gunthorpe:
>> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote:
>>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe:
From: Jason Gunthorpe
8 of the mmu_notifier using drivers (i915_gem
I got compiler warnings after update this morning, because the variables
are not initialized in df_v3_6_set_df_cstate() return failed path.
CC [M] drivers/gpu/drm/amd/amdgpu/gmc_v9_0.o
CC [M] drivers/gpu/drm/amd/amdgpu/gfxhub_v1_1.o
/home/yangp/git/compute_staging/kernel/drivers/gpu/drm/am
Reviewed-by: Philip Yang
On 2019-10-17 1:56 p.m., Kim, Jonathan wrote:
> fixing compiler warnings in df v3.6 for c-state toggle and pmc count.
>
> Change-Id: I74f8f1eafccf523a89d60d005e3549235f75c6b8
> Signed-off-by: Jonathan Kim
> ---
> drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 4 ++--
> 1 fil
If device is locked for suspend and resume, kfd open should return
failed -EAGAIN without creating process, otherwise the application exit
to release the process will hang to wait for resume is done if the suspend
and resume is stuck somewhere. This is backtrace:
[Thu Oct 17 16:43:37 2019] INFO: t
On 2019-10-18 11:40 a.m., Kuehling, Felix wrote:
> On 2019-10-18 10:27 a.m., Yang, Philip wrote:
>> If device is locked for suspend and resume, kfd open should return
>> failed -EAGAIN without creating process, otherwise the application exit
>> to release the process will ha
If device is locked for suspend and resume, kfd open should return
failed -EAGAIN without creating process, otherwise the application exit
to release the process will hang to wait for resume is done if the suspend
and resume is stuck somewhere. This is backtrace:
v2: fix processes that were create
If device reset/suspend/resume failed for some reason, dqm lock is
hold forever and this causes deadlock. Below is a kernel backtrace when
application open kfd after suspend/resume failed.
Instead of holding dqm lock in pre_reset and releasing dqm lock in
post_reset, add dqm->device_stopped flag w
On 2019-10-21 9:03 p.m., Kuehling, Felix wrote:
>
> On 2019-10-21 5:04 p.m., Yang, Philip wrote:
>> If device reset/suspend/resume failed for some reason, dqm lock is
>> hold forever and this causes deadlock. Below is a kernel backtrace when
>> application open kfd aft
If device reset/suspend/resume failed for some reason, dqm lock is
hold forever and this causes deadlock. Below is a kernel backtrace when
application open kfd after suspend/resume failed.
Instead of holding dqm lock in pre_reset and releasing dqm lock in
post_reset, add dqm->device_stopped flag w
On 2019-10-22 2:40 p.m., Grodzovsky, Andrey wrote:
>
> On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote:
>> On 10/22/19 2:28 PM, Yang, Philip wrote:
>>> If device reset/suspend/resume failed for some reason, dqm lock is
>>> hold forever and this causes deadlock. Be
On 2019-10-22 2:44 p.m., Kuehling, Felix wrote:
> On 2019-10-22 14:28, Yang, Philip wrote:
>> If device reset/suspend/resume failed for some reason, dqm lock is
>> hold forever and this causes deadlock. Below is a kernel backtrace when
>> application open kfd after
On 2019-10-22 3:36 p.m., Grodzovsky, Andrey wrote:
>
> On 10/22/19 3:19 PM, Yang, Philip wrote:
>>
>> On 2019-10-22 2:40 p.m., Grodzovsky, Andrey wrote:
>>> On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote:
>>>> On 10/22/19 2:28 PM, Yang, Philip wrote:
>
Hi Jason,
I did quick test after merging amd-staging-drm-next with the
mmu_notifier branch, which includes this set changes. The test result
has different failures, app stuck intermittently, GUI no display etc. I
am understanding the changes and will try to figure out the cause.
Regards,
Phili
On 2019-10-29 3:25 p.m., Jason Gunthorpe wrote:
> On Tue, Oct 29, 2019 at 07:22:37PM +0000, Yang, Philip wrote:
>> Hi Jason,
>>
>> I did quick test after merging amd-staging-drm-next with the
>> mmu_notifier branch, which includes this set changes. The test result
>
On 2019-11-01 11:12 a.m., Jason Gunthorpe wrote:
> On Fri, Nov 01, 2019 at 02:44:51PM +0000, Yang, Philip wrote:
>>
>>
>> On 2019-10-29 3:25 p.m., Jason Gunthorpe wrote:
>>> On Tue, Oct 29, 2019 at 07:22:37PM +, Yang, Philip wrote:
>>>> Hi Jason,
&g
On 2019-11-01 1:42 p.m., Jason Gunthorpe wrote:
> On Fri, Nov 01, 2019 at 03:59:26PM +0000, Yang, Philip wrote:
>>> This test for range_blockable should be before mutex_lock, I can move
>>> it up
>>>
>> yes, thanks.
>
> Okay, I wrote it like this:
&
Sorry, resend patch, the one in previous email missed couple of lines
duo to copy/paste.
On 2019-11-01 3:45 p.m., Yang, Philip wrote:
>
>
> On 2019-11-01 1:42 p.m., Jason Gunthorpe wrote:
>> On Fri, Nov 01, 2019 at 03:59:26PM +0000, Yang, Philip wrote:
>>>> This test
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
This depends on several HMM patchset from Jérôme Glisse queued for
upstream.
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
On 2018-10-17 04:15 AM, Christian König wrote:
> Am 17.10.18 um 04:56 schrieb Yang, Philip:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> userptr and start CPU page table update track of those pages. Then use
>> hmm_vma_range_done() to check if th
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
For sdma v4, there is bug caused by
commit d4e869b6b5d6 ("drm/amdgpu: add ring test for page queue")'
local variable ring is reused and changed, so
amdgpu_ttm_set_buffer_funcs_status(adev, true)
is skipped accidently. As a result, amdgpu_fill_buffer() will fail, kernel
message:
[drm:amdgpu_fill
Currently kfd uses noretry module parameter but gfx uses hardcode
setting. Change both to use same noretry module parameter.
Set default value to 1, to disable retry for better performance.
Export noretry value to kfd userspace runtime through topology.
The permission is 0644, means it can be ch
On 2018-11-07 12:53 p.m., Kuehling, Felix wrote:
> [+Philip]
>
> On 2018-11-07 12:25 a.m., Zhang, Jerry(Junwei) wrote:
>> On 11/7/18 1:15 PM, Trigger Huang wrote:
>>> Currently, SDMA page queue is not used under SR-IOV VF, and this
>>> queue will
>>> cause ring test failure in amdgpu module reload
The bug limits the IH ring wptr address to 40bit. When the system memory
is bigger than 1TB, the bus address is more than 40bit, this causes the
interrupt cannot be handled and cleared correctly.
Change-Id: I3cd1b8ad046b38945372f2fd1a2d225624893e28
Signed-off-by: Philip Yang
---
drivers/gpu/drm/
paging queues doorbell index use existing assignment sDMA_HI_PRI_ENGINE0/1
index, and increase SDMA_DOORBELL_RANGE size from 2 dwords to 4 dwords to
enable the new doorbell index.
Change-Id: I9adb965f16ee4089d261d9a22231337739184e49
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/nbio_
On 2018-11-15 11:43 a.m., Alex Deucher wrote:
> On Thu, Nov 15, 2018 at 11:08 AM Yang, Philip wrote:
>> paging queues doorbell index use existing assignment sDMA_HI_PRI_ENGINE0/1
>> index, and increase SDMA_DOORBELL_RANGE size from 2 dwords to 4 dwords to
>> enable t
t;Felix
>
> -Original Message-----
> From: amd-gfx On Behalf Of Yang,
> Philip
> Sent: Thursday, November 15, 2018 12:54 PM
> To: Alex Deucher
> Cc: amd-gfx list
> Subject: Re: [PATCH] drm/amdgpu: enable paging queue doorbell support
>
> On 2018-11-15 11:43
paging queues doorbell index use existing assignment sDMA_HI_PRI_ENGINE0/1
index, and increase SDMA_DOORBELL_RANGE size from 2 dwords to 4 dwords to
enable the new doorbell index.
v2: disable paging queue doorbell on Vega10 and Vega12 with SRIOV
Change-Id: I9adb965f16ee4089d261d9a22231337739184e4
Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging queue
will
break SRIOV, instead we can reserve and map two doorbell pages for amdgpu,
paging
queues doorbell index use same index as SDMA gfx queues index but on second
page.
For Vega20, after we change doorbell layout to
On 2018-11-16 4:35 p.m., Alex Deucher wrote:
> On Fri, Nov 16, 2018 at 2:08 PM Yang, Philip wrote:
>> Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging
>> queue will
>> break SRIOV, instead we can reserve and map two doorbell pages for amdgpu,
>>
Based SDMA fw version to enable has_page_queue support. Have to move
sdma_v4_0_init_microcode from sw_init to early_init, to load firmware
and init fw_version before set_ring/buffer/vm_pte_funcs use it.
Change-Id: Ife5d4659d28bc2a7012b48947b27e929749d87c1
Signed-off-by: Philip Yang
---
drivers/g
Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging queue
will
break SRIOV, instead we can reserve and map two doorbell pages for amdgpu,
paging
queues doorbell index use same index as SDMA gfx queues index but on second
page.
For Vega20, after we change doorbell layout to
This looks like copy paste typo
Change-Id: Iee3fd3a551650ec9199bc030a7886e92000b02e7
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
b/drivers/gpu/drm/amd/amdgp
On 2018-11-19 3:57 p.m., Deucher, Alexander wrote:
>> -Original Message-
>> From: amd-gfx On Behalf Of
>> Yang, Philip
>> Sent: Monday, November 19, 2018 3:20 PM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Yang, Philip
>> Subject: [PATCH 3/3] d
Reviewed-by: Philip Yang
On 2018-11-21 9:54 a.m., Alex Deucher wrote:
> Keep it disabled until we confirm it's ready.
>
> Signed-off-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/a
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
The depdent HMM patchsets from Jérôme Glisse are all merged into 4.20.0
kern
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
On 2018-12-04 4:06 a.m., Christian König wrote:
> Am 03.12.18 um 21:19 schrieb Yang, Philip:
>> Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
>> callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
>> DRM_AMDGPU_USERPTR Kconfig.
>&
On 2018-12-03 8:52 p.m., Kuehling, Felix wrote:
> See comments inline. I didn't review the amdgpu_cs and amdgpu_gem parts
> as I don't know them very well.
>
> On 2018-12-03 3:19 p.m., Yang, Philip wrote:
>> Use HMM helper function hmm_vma_fault() to get physical pages
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
The depdent HMM patchset from Jérôme Glisse are all merged into 4.20.0
kerne
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
I got this compilation error message after I rebase this morning, do I miss
anything?
/home/yangp/git/compute_staging/kernel/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:
In function ‘gfx_v8_0_rlc_resume’:
/home/yangp/git/compute_staging/kernel/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:4071:6:
error: impl
Never mind, just saw the patch to fix the typo.
On 2018-12-10 1:07 p.m., Yang, Philip wrote:
I got this compilation error message after I rebase this morning, do I miss
anything?
/home/yangp/git/compute_staging/kernel/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:
In function ‘gfx_v8_0_rlc_resume
On 2018-12-10 7:12 p.m., Kuehling, Felix wrote:
> This is a nice improvement from the last version. I still see some
> potential problems. See inline ...
>
> I'm skipping over the CS and GEM parts. I hope Christian can review
> those parts.
>
> On 2018-12-06 4:02 p.m.,
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
c/h to amdgpu_hmm.c/h.
>
> -David
>
>> -Original Message-
>> From: amd-gfx On Behalf Of Yang,
>> Philip
>> Sent: Friday, December 07, 2018 5:03 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Yang, Philip
>> Subject: [PATCH 1/3] drm/amdgpu:
re robust.
>
> Christian, would you review the CS and GEM parts? And maybe take a look
> you see nothing wrong with the amdgpu_ttm changes either.
>
> On 2018-12-13 4:01 p.m., Yang, Philip wrote:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> use
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
On 2018-12-17 9:12 p.m., Zeng, Oak wrote:
> Different ASIC has different sdma doorbell range. Add
> a per device sdma_doorbell_range field and initialize
> it.
>
> Change-Id: Idd980db1a72cfb373e24ac23ba3e48bb329ed4ad
> Signed-off-by: Oak Zeng
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell
irendra
> Sent: Tuesday, December 18, 2018 11:34 AM
> To: Zeng, Oak ; Yang, Philip ;
> amd-gfx@lists.freedesktop.org; Deng, Emily
> Subject: RE: [PATCH 1/2] drm/amdgpu: Add per device sdma_doorbell_range field
>
> Hi Oak,
>
> Windows will set 4 dwords for both sdma0 and
The series is Reviewed-by: Philip Yang
On 2018-12-17 9:12 p.m., Zeng, Oak wrote:
> Different ASIC has different SDMA queues so different
> SDMA doorbell range. Introduce an extra parameter
> to sdma_doorbell_range function and set sdma doorbell
> range correctly.
>
> Change-Id: I9b8d75b04f5a47ef
This series are verified on 4 Vega20 by: Philip Yang
On 2018-12-20 10:39 a.m., Kasiviswanathan, Harish wrote:
> This patch set Reviewed-by: Harish Kasiviswanathan
>
>
> On 2018-12-19 6:09 p.m., Alex Deucher wrote:
>> Configure PCIE_CI_CNTL to work around a hw bug that affects
>> some multi-GPU
ies is Reviewed-by: Felix
> Kuehling
>
> Regards,
> Felix
>
> On 2018-12-14 4:10 p.m., Yang, Philip wrote:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> userptr and start CPU page table update track of those pages. Then use
>>
On 2019-01-07 9:21 a.m., Christian König wrote:
> Am 14.12.18 um 22:10 schrieb Yang, Philip:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> userptr and start CPU page table update track of those pages. Then use
>> hmm_vma_range_done() to che
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
The depdent HMM patchset from Jérôme Glisse are all merged into 4.20.0
kerne
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
Ping Christian, any comments for the GEM and CS part changes?
Thanks. Philip
On 2019-01-10 12:02 p.m., Yang, Philip wrote:
> Use HMM helper function hmm_vma_fault() to get physical pages backing
> userptr and start CPU page table update track of those pages. Then use
> hmm_vma_range_
I found same issue while debugging, I will submit patch to fix this shortly.
Philip
On 2019-01-30 10:35 p.m., Mikhail Gavrilov wrote:
> Hi folks.
> Yet another kernel panic happens while GPU again is hang:
>
> [ 1469.906798]
> [ 1469.906799] WARNING: inconsistent
amdgpu_vm_get_task_info is called from interrupt handler and sched timeout
workqueue, so it is needed to use irq version spin_lock to avoid deadlock.
Change-Id: Ifedd4b97535bf0b5d3936edd2d9688957020efd4
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +++--
1 file changed, 3 insertions(+), 2 delet
Hi Christian,
This patch is rebased to lastest HMM. Please review the GEM and CS part changes
in patch 3/3.
Regards,
Philip Yang (3):
drm/amdgpu: use HMM mirror callback to replace mmu notifier v6
drm/amdkfd: avoid HMM change cause circular lock dependency v2
drm/amdgpu: replace get_user_p
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
The depdent HMM patchset from Jérôme Glisse are all merged into 4.20.0
kerne
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
On 2019-02-04 10:18 a.m., Christian König wrote:
> Am 04.02.19 um 16:06 schrieb Yang, Philip:
>> Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
>> callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
>> DRM_AMDGPU_USERPTR Kconfig.
&
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Y
Hi Christian,
This patch is rebased to lastest HMM. Please review the GEM and CS part changes
in patch 3/3.
Thanks,
Philip Yang (3):
drm/amdgpu: use HMM mirror callback to replace mmu notifier v7
drm/amdkfd: avoid HMM change cause circular lock dependency v2
drm/amdgpu: replace get_user_pa
Hi Christian,
My comments are embedded below. I will submit another patch to address
those.
Thanks,
Philip
On 2019-02-05 6:52 a.m., Christian König wrote:
> Am 04.02.19 um 19:23 schrieb Yang, Philip:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> us
Hi Christian,
I will submit new patch for review, my comments embedded inline below.
Thanks,
Philip
On 2019-02-05 1:09 p.m., Koenig, Christian wrote:
> Am 05.02.19 um 18:25 schrieb Yang, Philip:
>> [SNIP]+
>>>> + if (r == -ERESTARTSYS) {
>&
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
ries might not be sufficient any more.
>
Yes, it looks better to handle retry from user space. The extra sys call
overhead can be ignored because this does not happen all the time. I
will submit new patch for review.
Thanks,
Philip
On 2019-02-06 4:20 a.m., Christian König wrote:
> Am 0
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
Hi Christian,
Resend patch 1/3, 2/3, added Reviewed-by in comments.
Change in patch 3/3, amdgpu_cs_submit, amdgpu_cs_ioctl return -EAGAIN
to user space to retry cs_ioctl.
Regards,
Philip
Philip Yang (3):
drm/amdgpu: use HMM mirror callback to replace mmu notifier v7
drm/amdkfd: avoid HMM ch
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Y
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
Those options are needed to support HMM
Change-Id: Ieb7bb3bcec07245d79a02793e6728228decc400a
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 960a
Thanks Jerome for the the correct HMM config option, only select
HMM_MIRROR is not good enough because CONFIG_HMM option maybe missing,
add depends on ARCH_HAS_HMM will solve the issue.
I will submit new patch to fix the compilation error if HMM_MIRROR
config is missing and the HMM config depen
Only select HMM_MIRROR will get kernel config dependency warnings
if CONFIG_HMM is missing in the config. Add depends on HMM will
solve the issue.
Add conditional compilation to fix compilation errors if HMM_MIRROR
is not enabled as HMM config is not enabled.
Change-Id: I1b44a0b5285bbef5e98bfb045
Hi Michel,
Yes, I found the same issue and the bug has been fixed by Jerome:
876b462120aa mm/hmm: use reference counting for HMM struct
The fix is on hmm-for-5.1 branch, I cherry-pick it into my local branch
to workaround the issue.
Regards,
Philip
On 2019-02-27 12:02 p.m., Michel Dänzer wrot
amd-staging-drm-next will rebase to kernel 5.1 to pickup this fix
automatically. As a short-term workaround, please cherry-pick this fix
into your local repository.
Regards,
Philip
On 2019-02-27 12:33 p.m., Michel Dänzer wrote:
> On 2019-02-27 6:14 p.m., Yang, Philip wrote:
>>
.
>
> Alex
>
> *From:* amd-gfx on behalf of
> Yang, Philip
> *Sent:* Wednesday, February 27, 2019 1:05 PM
> *To:* Michel Dänzer; Jérôme Glisse
> *Cc:* linux...@kvack.org; amd-gfx@lists.freedesktop.org
> *Subject:* Re: KASAN caught amdgpu / HMM use-after-f
:
>
> [ Dropping Jérôme and the linux-mm list ]
>
> On 2019-02-27 7:48 p.m., Yang, Philip wrote:
>> Hi Alex,
>>
>> Pushed, thanks.
>>
>> mm/hmm: use reference counting for HMM struct
>
> Thanks, but I'm not seeing it yet. Maybe it needs some sp
Those corner cases are found by kfdtest.KFDIPCTest.
userptr may cross two vmas if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new vma adjacent to old vma which was cloned
from parent process, some pages of use
han one VMA, fail
> 2. Loop over all the VMAs in the address range
>
> Thanks,
>Felix
>
> -----Original Message-
> From: amd-gfx On Behalf Of Yang,
> Philip
> Sent: Friday, March 01, 2019 12:30 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Yang, Ph
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.
Bugzil
userptr may cross two VMAs if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new VMA adjacent to old VMA which was cloned
from parent process, some pages of userptr are in the first VMA, the
rest pages are in the
Userptr restore may have concurrent userptr invalidation after
hmm_vma_fault adds the range to the hmm->ranges list, needs call
hmm_vma_range_done to remove the range from hmm->ranges list first,
then reschedule the restore worker. Otherwise hmm_vma_fault will add
same range to the list, this will
Those corner cases are found by kfdtest.KFDIPCTest.
Philip Yang (3):
drm/amdkfd: support concurrent userptr update for HMM
drm/amdgpu: support userptr cross VMAs case with HMM
drm/amdgpu: more descriptive message if HMM not enabled
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 28 +++-
On 2019-03-06 4:05 a.m., Michel Dänzer wrote:
> On 2019-03-05 7:09 p.m., Yang, Philip wrote:
>> If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
>> so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
>> error message "Failed to reg
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.
Bugzil
On 2019-03-06 10:04 a.m., Christian König wrote:
> Am 06.03.19 um 16:02 schrieb Yang, Philip:
>> If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
>> so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
>> error message "Failed to
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.
Bugzil
at that point needs to be untracked.
>
> For now as a quick fix for an urgent bug, this change is Reviewed-by:
> Felix Kuehling . But please revisit this and
> check if there are similar corner cases as I explained above.
>
> Regards,
> Felix
>
> On 3/5/2019 1:09 PM, Yang,
I will submit v2 to fix those issues. Some comments inline...
On 2019-03-06 3:11 p.m., Kuehling, Felix wrote:
> Some comments inline ...
>
> On 3/5/2019 1:09 PM, Yang, Philip wrote:
>> userptr may cross two VMAs if the forked child process (not call exec
>> after fork) mal
userptr may cross two VMAs if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new VMA adjacent to old VMA which was cloned
from parent process, some pages of userptr are in the first VMA, the
rest pages are in the
Userptr restore may have concurrent userptr invalidation after
hmm_vma_fault adds the range to the hmm->ranges list, needs call
hmm_vma_range_done to remove the range from hmm->ranges list first,
then reschedule the restore worker. Otherwise hmm_vma_fault will add
same range to the list, this will
1 - 100 of 137 matches
Mail list logo