Sorry, resend patch, the one in previous email missed couple of lines
duo to copy/paste.
On 2019-11-01 3:45 p.m., Yang, Philip wrote:
>
>
> On 2019-11-01 1:42 p.m., Jason Gunthorpe wrote:
>> On Fri, Nov 01, 2019 at 03:59:26PM +0000, Yang, Philip wrote:
>>>> This test
On 2019-11-01 1:42 p.m., Jason Gunthorpe wrote:
> On Fri, Nov 01, 2019 at 03:59:26PM +0000, Yang, Philip wrote:
>>> This test for range_blockable should be before mutex_lock, I can move
>>> it up
>>>
>> yes, thanks.
>
> Okay, I wrote it like this:
&
On 2019-11-01 11:12 a.m., Jason Gunthorpe wrote:
> On Fri, Nov 01, 2019 at 02:44:51PM +0000, Yang, Philip wrote:
>>
>>
>> On 2019-10-29 3:25 p.m., Jason Gunthorpe wrote:
>>> On Tue, Oct 29, 2019 at 07:22:37PM +, Yang, Philip wrote:
>>>> Hi Jason,
&g
On 2019-10-29 3:25 p.m., Jason Gunthorpe wrote:
> On Tue, Oct 29, 2019 at 07:22:37PM +0000, Yang, Philip wrote:
>> Hi Jason,
>>
>> I did quick test after merging amd-staging-drm-next with the
>> mmu_notifier branch, which includes this set changes. The test result
>
Hi Jason,
I did quick test after merging amd-staging-drm-next with the
mmu_notifier branch, which includes this set changes. The test result
has different failures, app stuck intermittently, GUI no display etc. I
am understanding the changes and will try to figure out the cause.
Regards,
Phili
On 2019-10-22 3:36 p.m., Grodzovsky, Andrey wrote:
>
> On 10/22/19 3:19 PM, Yang, Philip wrote:
>>
>> On 2019-10-22 2:40 p.m., Grodzovsky, Andrey wrote:
>>> On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote:
>>>> On 10/22/19 2:28 PM, Yang, Philip wrote:
>
On 2019-10-22 2:44 p.m., Kuehling, Felix wrote:
> On 2019-10-22 14:28, Yang, Philip wrote:
>> If device reset/suspend/resume failed for some reason, dqm lock is
>> hold forever and this causes deadlock. Below is a kernel backtrace when
>> application open kfd after
On 2019-10-22 2:40 p.m., Grodzovsky, Andrey wrote:
>
> On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote:
>> On 10/22/19 2:28 PM, Yang, Philip wrote:
>>> If device reset/suspend/resume failed for some reason, dqm lock is
>>> hold forever and this causes deadlock. Be
If device reset/suspend/resume failed for some reason, dqm lock is
hold forever and this causes deadlock. Below is a kernel backtrace when
application open kfd after suspend/resume failed.
Instead of holding dqm lock in pre_reset and releasing dqm lock in
post_reset, add dqm->device_stopped flag w
On 2019-10-21 9:03 p.m., Kuehling, Felix wrote:
>
> On 2019-10-21 5:04 p.m., Yang, Philip wrote:
>> If device reset/suspend/resume failed for some reason, dqm lock is
>> hold forever and this causes deadlock. Below is a kernel backtrace when
>> application open kfd aft
If device reset/suspend/resume failed for some reason, dqm lock is
hold forever and this causes deadlock. Below is a kernel backtrace when
application open kfd after suspend/resume failed.
Instead of holding dqm lock in pre_reset and releasing dqm lock in
post_reset, add dqm->device_stopped flag w
If device is locked for suspend and resume, kfd open should return
failed -EAGAIN without creating process, otherwise the application exit
to release the process will hang to wait for resume is done if the suspend
and resume is stuck somewhere. This is backtrace:
v2: fix processes that were create
On 2019-10-18 11:40 a.m., Kuehling, Felix wrote:
> On 2019-10-18 10:27 a.m., Yang, Philip wrote:
>> If device is locked for suspend and resume, kfd open should return
>> failed -EAGAIN without creating process, otherwise the application exit
>> to release the process will ha
If device is locked for suspend and resume, kfd open should return
failed -EAGAIN without creating process, otherwise the application exit
to release the process will hang to wait for resume is done if the suspend
and resume is stuck somewhere. This is backtrace:
[Thu Oct 17 16:43:37 2019] INFO: t
Reviewed-by: Philip Yang
On 2019-10-17 1:56 p.m., Kim, Jonathan wrote:
> fixing compiler warnings in df v3.6 for c-state toggle and pmc count.
>
> Change-Id: I74f8f1eafccf523a89d60d005e3549235f75c6b8
> Signed-off-by: Jonathan Kim
> ---
> drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 4 ++--
> 1 fil
I got compiler warnings after update this morning, because the variables
are not initialized in df_v3_6_set_df_cstate() return failed path.
CC [M] drivers/gpu/drm/amd/amdgpu/gmc_v9_0.o
CC [M] drivers/gpu/drm/amd/amdgpu/gfxhub_v1_1.o
/home/yangp/git/compute_staging/kernel/drivers/gpu/drm/am
On 2019-10-17 4:54 a.m., Christian König wrote:
> Am 16.10.19 um 18:04 schrieb Jason Gunthorpe:
>> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote:
>>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe:
From: Jason Gunthorpe
8 of the mmu_notifier using drivers (i915_gem
On 2019-10-11 1:33 p.m., Kuehling, Felix wrote:
> On 2019-10-11 10:36 a.m., Yang, Philip wrote:
>> user_pages array should always be freed after validation regardless if
>> user pages are changed after bo is created because with HMM change parse
>> bo always allocate user pa
user_pages array should always be freed after validation regardless if
user pages are changed after bo is created because with HMM change parse
bo always allocate user pages array to get user pages for userptr bo.
v2: remove unused local variable and amend commit
v3: add back get user pages in ge
On 2019-10-11 4:40 a.m., Christian König wrote:
> Am 03.10.19 um 21:44 schrieb Yang, Philip:
>> user_pages array should always be freed after validation regardless if
>> user pages are changed after bo is created because with HMM change parse
>> bo always allocate user pa
invalidated when amdgpu_cs_submit. I don't find issue for
overnight test, but not sure if there is potential side effect.
Thanks,
Philip
On 2019-10-03 3:44 p.m., Yang, Philip wrote:
> user_pages array should always be freed after validation regardless if
> user pages are changed after bo
user_pages array should always be freed after validation regardless if
user pages are changed after bo is created because with HMM change parse
bo always allocate user pages array to get user pages for userptr bo.
Don't need to get user pages while creating uerptr bo because user pages
will only b
user_pages array should be freed regardless if user pages are
invalidated after bo is created because HMM change to always allocate
user pages array to get user pages while parsing user page bo.
Don't need to to get user pages while creating bo because user pages
will only be used after parsing us
On 2019-09-09 8:03 a.m., Christian König wrote:
> Am 04.09.19 um 22:12 schrieb Yang, Philip:
>> This series looks nice and clear for me, two questions embedded below.
>>
>> Are we going to use dedicated sdma page queue for direct VM update path
>> during a fault?
>&
To avoid NULL function pointer access. This happens on VG10, reboot
command hangs and have to power off/on to reboot the machine. This is
serial console log:
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Reached target Final Step.
Starting Reboot...
[ 305.696271] systemd-shut
VMID0 init path was missed when enabling amdgpu_noretry option. Good
catch and fix.
Reviewed-by: Philip Yang
On 2019-09-04 7:31 p.m., Kuehling, Felix wrote:
> There is no point retrying page faults in VMID0. Those faults are
> always fatal.
>
> Signed-off-by: Felix Kuehling
> ---
> drivers/
This series looks nice and clear for me, two questions embedded below.
Are we going to use dedicated sdma page queue for direct VM update path
during a fault?
Thanks,
Philip
On 2019-09-04 11:02 a.m., Christian König wrote:
> Next step towards HMM support. For now just silence the retry fault an
On 2019-08-15 8:54 p.m., Jason Gunthorpe wrote:
> On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote:
>> hmm_range_fault may return NULL pages because some of pfns are equal to
>> HMM_PFN_NONE. This happens randomly under memory pressure. The reason is
>> for swap
hmm_range_fault may return NULL pages because some of pfns are equal to
HMM_PFN_NONE. This happens randomly under memory pressure. The reason is
for swapped out page pte path, hmm_vma_handle_pte doesn't update fault
variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap
the page in.
On 2019-07-04 12:02 p.m., Kuehling, Felix wrote:
> On 2019-07-03 6:19 p.m., Yang, Philip wrote:
>> amdgpu_noretry default value is 0, this will generate VM fault storm
>> because the vm fault is not recovered. It may slow down the machine and
>> need reboot after applic
amdgpu_noretry default value is 0, this will generate VM fault storm
because the vm fault is not recovered. It may slow down the machine and
need reboot after application VM fault. Maybe change default value to 1?
Other than that, this is reviewed by Philip Yang
On 2019-07-02 3:05 p.m., Kuehli
Under memory pressure, hmm_range_fault may return error code -ENOMEM
or -EBUSY, change pr_info to pr_debug to remove unnecessary kernel log
message because we will retry restore again.
Call get_user_pages_done if TTM get user pages failed will have
WARN_ONCE kernel calling stack dump log.
Change-
I just figured out previous patch have issue. New patch is simple and
looks good to me.
This series is Reviewed-by: Philip.Yang
On 2019-06-14 9:27 p.m., Zeng, Oak wrote:
> This reverts commit 0a7c7281bdaae8cf63d77be26a4b46128114bdec.
> This fix is not proper. allocate_mqd can't be moved before
Hi Emily,
I am not familiar with vbios and driver init part, just based on my
experience, the patch don't modify amdgpu_get_bios but move
amdgpu_get_bios to amdgpu_device_ip_early_init from amdgpu_device_init,
so amdgpu_get_bios is executed earlier. The kernel error message "BUG:
kernel NULL p
On 2019-06-13 4:54 a.m., Koenig, Christian wrote:
> Am 12.06.19 um 23:13 schrieb Yang, Philip:
>> On 2019-06-12 3:28 p.m., Christian König wrote:
>>> Am 12.06.19 um 17:13 schrieb Yang, Philip:
>>>> TTM create two zones, kernel zone and dma32 zone for system memory.
Rebase to https://github.com/jgunthorpe/linux.git hmm branch, need some
changes because of interface hmm_range_register change. Then run a quick
amdgpu_test. Test is finished, result is ok. But there is below kernel
BUG message, seems hmm_free_rcu calls down_write.
[ 1171.919921] BUG: sleep
On 2019-06-12 3:28 p.m., Christian König wrote:
> Am 12.06.19 um 17:13 schrieb Yang, Philip:
>> TTM create two zones, kernel zone and dma32 zone for system memory. If
>> system memory address allocated is below 4GB, this account to dma32 zone
>> and will exhaust dma32 zone a
e_flags. Is that chain broken
> somewhere? Overriding glob->mem_glob->num_zones from amdgpu seems to be
> a bit of a hack.
>
> Regards,
> Felix
>
> On 2019-06-12 8:13, Yang, Philip wrote:
>> TTM create two zones, kernel zone and dma32 zone for system memory. If
&
TTM create two zones, kernel zone and dma32 zone for system memory. If
system memory address allocated is below 4GB, this account to dma32 zone
and will exhaust dma32 zone and trigger unnesssary TTM eviction.
Patch "drm/ttm: Account for kernel allocations in kernel zone only" only
handle the alloc
HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver
path. The old hmm APIs are deprecated and will be removed in future.
Below are changes in driver:
1. Change hmm_vma_fault to hmm_range_register and hmm_range_fault which
supports range with multiple vmas, remove the multiple vma
On 2019-06-03 5:02 p.m., Kuehling, Felix wrote:
> On 2019-06-03 2:44 p.m., Yang, Philip wrote:
>> HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver
>> path. The old hmm APIs are deprecated and will be removed in future.
>>
>> Below are changes
HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver
path. The old hmm APIs are deprecated and will be removed in future.
Below are changes in driver:
1. Change hmm_vma_fault to hmm_range_register and hmm_range_fault which
supports range with multiple vmas, remove the multiple vma
On 2019-06-03 7:23 a.m., Christian König wrote:
> Am 03.06.19 um 12:17 schrieb Christian König:
>> Am 01.06.19 um 00:01 schrieb Kuehling, Felix:
>>> On 2019-05-31 5:32 p.m., Yang, Philip wrote:
>>>> On 2019-05-31 3:42 p.m., Kuehling, Felix wrote:
>>>>>
On 2019-05-31 3:42 p.m., Kuehling, Felix wrote:
> On 2019-05-31 1:28 p.m., Yang, Philip wrote:
>>
>> On 2019-05-30 6:36 p.m., Kuehling, Felix wrote:
>>>>
>>>> #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
>>>> - if (gtt->ranges &&
HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver
path. The old hmm APIs are deprecated and will be removed in future.
Below are changes in driver:
1. Change hmm_vma_fault to hmm_range_register and hmm_range_fault which
supports range with multiple vmas, remove the multiple vma
On 2019-05-30 6:36 p.m., Kuehling, Felix wrote:
>>
>>#if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
>> -if (gtt->ranges &&
>> -ttm->pages[0] == hmm_pfn_to_page(>t->ranges[0],
>> - gtt->ranges[0].pfns[0]))
>> +if (gtt->range &&
>> +
HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver
path. The old hmm APIs are deprecated and will be removed in future.
Below are changes in driver:
1. Change hmm_vma_fault to hmm_range_register and hmm_range_fault which
supports range with multiple vmas, remove the multiple vma
On 2019-05-07 5:52 p.m., Kuehling, Felix wrote:
> Use unsigned long for number of pages.
>
> Check that pfns are valid after hmm_vma_fault. If they are not,
> return an error instead of continuing with invalid page pointers and
> PTEs.
>
> Signed-off-by: Felix Kuehling
Reviewed-by: Philip Yang
After patch "drm: Use the same mmap-range offset and size for GEM and
TTM", application failed to create bo of system memory because drm
mmap_range size decrease to 64GB from original 1TB. This is not big
enough for applications. Increase the drm mmap_range size to 1TB.
Change-Id: Id482af261f56f32
Hi Marek,
I guess you are using old kernel config with 5.x kernel, and the kernel
config option CONFIG_HMM is missing because the dependency option
CONFIG_ZONE_DEVICE is missing in old config file. Please update your
kernel config file to enable option CONFIG_ZONE_DEVICE.
You should have this
Hi Tom,
Yes, we are missing some HMM fixes/changes from 5.1, but the crash log
seems not related to those fixes/changes in 5.1.
I did see the similar crash log in __mmu_notifier_release path that
should be fixed by the patch "use reference counting for HMM struct" as
Alex mentioned. Since you
Hi Felix,
Submitted v3 to fix the potential problems with invalid userptr.
Philip
On 2019-03-12 3:30 p.m., Kuehling, Felix wrote:
> See one comment inline. There are still some potential problems that
> you're not catching.
>
> On 2019-03-06 9:42 p.m., Yang, Philip wrote:
>
userptr may cross two VMAs if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new VMA adjacent to old VMA which was cloned
from parent process, some pages of userptr are in the first VMA, the
rest pages are in the
vm fault happens about 1/10 for KFDCWSRTest.BasicTest for me. I am using
SDMA for page table update. I don't try CPU page table update.
Philip
On 2019-03-12 11:12 a.m., Russell, Kent wrote:
> Peculiar, I hit it immediately when I ran it . Can you try use
> --gtest_filter=KFDCWSRTest.BasicTest
userptr may cross two VMAs if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new VMA adjacent to old VMA which was cloned
from parent process, some pages of userptr are in the first VMA, the
rest pages are in the
Userptr restore may have concurrent userptr invalidation after
hmm_vma_fault adds the range to the hmm->ranges list, needs call
hmm_vma_range_done to remove the range from hmm->ranges list first,
then reschedule the restore worker. Otherwise hmm_vma_fault will add
same range to the list, this will
I will submit v2 to fix those issues. Some comments inline...
On 2019-03-06 3:11 p.m., Kuehling, Felix wrote:
> Some comments inline ...
>
> On 3/5/2019 1:09 PM, Yang, Philip wrote:
>> userptr may cross two VMAs if the forked child process (not call exec
>> after fork) mal
at that point needs to be untracked.
>
> For now as a quick fix for an urgent bug, this change is Reviewed-by:
> Felix Kuehling . But please revisit this and
> check if there are similar corner cases as I explained above.
>
> Regards,
> Felix
>
> On 3/5/2019 1:09 PM, Yang,
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.
Bugzil
On 2019-03-06 10:04 a.m., Christian König wrote:
> Am 06.03.19 um 16:02 schrieb Yang, Philip:
>> If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
>> so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
>> error message "Failed to
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.
Bugzil
On 2019-03-06 4:05 a.m., Michel Dänzer wrote:
> On 2019-03-05 7:09 p.m., Yang, Philip wrote:
>> If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
>> so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
>> error message "Failed to reg
userptr may cross two VMAs if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new VMA adjacent to old VMA which was cloned
from parent process, some pages of userptr are in the first VMA, the
rest pages are in the
Userptr restore may have concurrent userptr invalidation after
hmm_vma_fault adds the range to the hmm->ranges list, needs call
hmm_vma_range_done to remove the range from hmm->ranges list first,
then reschedule the restore worker. Otherwise hmm_vma_fault will add
same range to the list, this will
Those corner cases are found by kfdtest.KFDIPCTest.
Philip Yang (3):
drm/amdkfd: support concurrent userptr update for HMM
drm/amdgpu: support userptr cross VMAs case with HMM
drm/amdgpu: more descriptive message if HMM not enabled
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 28 +++-
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.
Bugzil
han one VMA, fail
> 2. Loop over all the VMAs in the address range
>
> Thanks,
>Felix
>
> -----Original Message-
> From: amd-gfx On Behalf Of Yang,
> Philip
> Sent: Friday, March 01, 2019 12:30 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Yang, Ph
Those corner cases are found by kfdtest.KFDIPCTest.
userptr may cross two vmas if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new vma adjacent to old vma which was cloned
from parent process, some pages of use
:
>
> [ Dropping Jérôme and the linux-mm list ]
>
> On 2019-02-27 7:48 p.m., Yang, Philip wrote:
>> Hi Alex,
>>
>> Pushed, thanks.
>>
>> mm/hmm: use reference counting for HMM struct
>
> Thanks, but I'm not seeing it yet. Maybe it needs some sp
.
>
> Alex
>
> *From:* amd-gfx on behalf of
> Yang, Philip
> *Sent:* Wednesday, February 27, 2019 1:05 PM
> *To:* Michel Dänzer; Jérôme Glisse
> *Cc:* linux...@kvack.org; amd-gfx@lists.freedesktop.org
> *Subject:* Re: KASAN caught amdgpu / HMM use-after-f
amd-staging-drm-next will rebase to kernel 5.1 to pickup this fix
automatically. As a short-term workaround, please cherry-pick this fix
into your local repository.
Regards,
Philip
On 2019-02-27 12:33 p.m., Michel Dänzer wrote:
> On 2019-02-27 6:14 p.m., Yang, Philip wrote:
>>
Hi Michel,
Yes, I found the same issue and the bug has been fixed by Jerome:
876b462120aa mm/hmm: use reference counting for HMM struct
The fix is on hmm-for-5.1 branch, I cherry-pick it into my local branch
to workaround the issue.
Regards,
Philip
On 2019-02-27 12:02 p.m., Michel Dänzer wrot
Only select HMM_MIRROR will get kernel config dependency warnings
if CONFIG_HMM is missing in the config. Add depends on HMM will
solve the issue.
Add conditional compilation to fix compilation errors if HMM_MIRROR
is not enabled as HMM config is not enabled.
Change-Id: I1b44a0b5285bbef5e98bfb045
Thanks Jerome for the the correct HMM config option, only select
HMM_MIRROR is not good enough because CONFIG_HMM option maybe missing,
add depends on ARCH_HAS_HMM will solve the issue.
I will submit new patch to fix the compilation error if HMM_MIRROR
config is missing and the HMM config depen
Those options are needed to support HMM
Change-Id: Ieb7bb3bcec07245d79a02793e6728228decc400a
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 960a
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
Hi Christian,
Resend patch 1/3, 2/3, added Reviewed-by in comments.
Change in patch 3/3, amdgpu_cs_submit, amdgpu_cs_ioctl return -EAGAIN
to user space to retry cs_ioctl.
Regards,
Philip
Philip Yang (3):
drm/amdgpu: use HMM mirror callback to replace mmu notifier v7
drm/amdkfd: avoid HMM ch
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Y
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
ries might not be sufficient any more.
>
Yes, it looks better to handle retry from user space. The extra sys call
overhead can be ignored because this does not happen all the time. I
will submit new patch for review.
Thanks,
Philip
On 2019-02-06 4:20 a.m., Christian König wrote:
> Am 0
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
Hi Christian,
I will submit new patch for review, my comments embedded inline below.
Thanks,
Philip
On 2019-02-05 1:09 p.m., Koenig, Christian wrote:
> Am 05.02.19 um 18:25 schrieb Yang, Philip:
>> [SNIP]+
>>>> + if (r == -ERESTARTSYS) {
>&
Hi Christian,
My comments are embedded below. I will submit another patch to address
those.
Thanks,
Philip
On 2019-02-05 6:52 a.m., Christian König wrote:
> Am 04.02.19 um 19:23 schrieb Yang, Philip:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> us
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Y
Hi Christian,
This patch is rebased to lastest HMM. Please review the GEM and CS part changes
in patch 3/3.
Thanks,
Philip Yang (3):
drm/amdgpu: use HMM mirror callback to replace mmu notifier v7
drm/amdkfd: avoid HMM change cause circular lock dependency v2
drm/amdgpu: replace get_user_pa
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
On 2019-02-04 10:18 a.m., Christian König wrote:
> Am 04.02.19 um 16:06 schrieb Yang, Philip:
>> Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
>> callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
>> DRM_AMDGPU_USERPTR Kconfig.
&
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
The depdent HMM patchset from Jérôme Glisse are all merged into 4.20.0
kerne
Hi Christian,
This patch is rebased to lastest HMM. Please review the GEM and CS part changes
in patch 3/3.
Regards,
Philip Yang (3):
drm/amdgpu: use HMM mirror callback to replace mmu notifier v6
drm/amdkfd: avoid HMM change cause circular lock dependency v2
drm/amdgpu: replace get_user_p
amdgpu_vm_get_task_info is called from interrupt handler and sched timeout
workqueue, so it is needed to use irq version spin_lock to avoid deadlock.
Change-Id: Ifedd4b97535bf0b5d3936edd2d9688957020efd4
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +++--
1 file changed, 3 insertions(+), 2 delet
I found same issue while debugging, I will submit patch to fix this shortly.
Philip
On 2019-01-30 10:35 p.m., Mikhail Gavrilov wrote:
> Hi folks.
> Yet another kernel panic happens while GPU again is hang:
>
> [ 1469.906798]
> [ 1469.906799] WARNING: inconsistent
Ping Christian, any comments for the GEM and CS part changes?
Thanks. Philip
On 2019-01-10 12:02 p.m., Yang, Philip wrote:
> Use HMM helper function hmm_vma_fault() to get physical pages backing
> userptr and start CPU page table update track of those pages. Then use
> hmm_vma_range_
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are upda
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
The depdent HMM patchset from Jérôme Glisse are all merged into 4.20.0
kerne
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Change-Id: I2ec0
On 2019-01-07 9:21 a.m., Christian König wrote:
> Am 14.12.18 um 22:10 schrieb Yang, Philip:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> userptr and start CPU page table update track of those pages. Then use
>> hmm_vma_range_done() to che
ies is Reviewed-by: Felix
> Kuehling
>
> Regards,
> Felix
>
> On 2018-12-14 4:10 p.m., Yang, Philip wrote:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> userptr and start CPU page table update track of those pages. Then use
>>
1 - 100 of 137 matches
Mail list logo