Re: [PATCH v2 13/15] drm/amdgpu: Use mmu_range_insert instead of hmm_mirror

2019-10-29 Thread Kuehling, Felix
On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote: > From: Jason Gunthorpe > > Remove the interval tree in the driver and rely on the tree maintained by > the mmu_notifier for delivering mmu_notifier invalidation callbacks. > > For some reason amdgpu has a very complicated arrangement where it tries

Re: [PATCH v2 02/15] mm/mmu_notifier: add an interval tree notifier

2019-10-29 Thread Kuehling, Felix
I haven't had enough time to fully understand the deferred logic in this change. I spotted one problem, see comments inline. On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote: > From: Jason Gunthorpe > > Of the 13 users of mmu_notifiers, 8 of them use only > invalidate_range_start/end() and immedia

Re: [PATCH v2 12/15] drm/amdgpu: Call find_vma under mmap_sem

2019-10-29 Thread Kuehling, Felix
On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote: > From: Jason Gunthorpe > > find_vma() must be called under the mmap_sem, reorganize this code to > do the vma check after entering the lock. > > Further, fix the unlocked use of struct task_struct's mm, instead use > the mm from hmm_mirror which has

Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-10-09 Thread Kuehling, Felix
On 2019-10-09 11:34, Daniel Vetter wrote: > On Wed, Oct 09, 2019 at 03:25:22PM +0000, Kuehling, Felix wrote: >> On 2019-10-09 6:31, Daniel Vetter wrote: >>> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote: >>>> The description sounds reasonable to me a

Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-10-09 Thread Kuehling, Felix
On 2019-10-09 6:31, Daniel Vetter wrote: > On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote: >> >> The description sounds reasonable to me and maps well to the CU masking >> feature in our GPUs. >> >> It would also allow us to do more coar

Re: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup

2019-10-08 Thread Kuehling, Felix
On 2019-08-29 2:05 a.m., Kenny Ho wrote: > The number of logical gpu (lgpu) is defined to be the number of compute > unit (CU) for a device. The lgpu allocation limit only applies to > compute workload for the moment (enforced via kfd queue creation.) Any > cu_mask update is validated against the

Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-10-08 Thread Kuehling, Felix
On 2019-08-29 2:05 a.m., Kenny Ho wrote: > drm.lgpu > A read-write nested-keyed file which exists on all cgroups. > Each entry is keyed by the DRM device's major:minor. > > lgpu stands for logical GPU, it is an abstraction used to > subdivide a physical DRM devic

Re: [PATCH] drm/amdkfd: add missing void argument to function kgd2kfd_init

2019-10-07 Thread Kuehling, Felix
On 2019-10-07 12:08 p.m., Alex Deucher wrote: > On Sat, Oct 5, 2019 at 1:58 PM Colin King wrote: >> From: Colin Ian King >> >> Function kgd2kfd_init is missing a void argument, add it >> to clean up the non-ANSI function declaration. >> >> Signed-off-by: Colin Ian King > Applied. thanks! Thank

Re: [PATCH] drm/amdkfd: fix a potential NULL pointer dereference

2019-09-19 Thread Kuehling, Felix
On 2019-09-18 12:30 p.m., Allen Pais wrote: > alloc_workqueue is not checked for errors and as a result, > a potential NULL dereference could occur. > > Signed-off-by: Allen Pais > --- > drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/dri

Re: linux-next: Tree for Aug 19 (amdgpu)

2019-08-21 Thread Kuehling, Felix
On 2019-08-20 8:36 a.m., Jason Gunthorpe wrote: > On Tue, Aug 20, 2019 at 11:45:54AM +1000, Stephen Rothwell wrote: >> Hi all, >> >> On Mon, 19 Aug 2019 18:34:41 -0700 Randy Dunlap >> wrote: >>> On 8/19/19 2:18 AM, Stephen Rothwell wrote: Hi all, Changes since 20190816: >

Re: [PATCH v3 hmm 10/11] drm/amdkfd: use mmu_notifier_put

2019-08-06 Thread Kuehling, Felix
On 2019-08-06 19:15, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > The sequence of mmu_notifier_unregister_no_release(), > mmu_notifier_call_srcu() is identical to mmu_notifier_put() with the > free_notifier callback. > > As this is the last user of those APIs, converting it means we can drop

Re: [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR

2019-08-06 Thread Kuehling, Felix
On 2019-08-06 13:44, Jason Gunthorpe wrote: > On Tue, Aug 06, 2019 at 07:05:53PM +0300, Christoph Hellwig wrote: >> The option is just used to select HMM mirror support and has a very >> confusing help text. Just pull in the HMM mirror code by default >> instead. >> >> Signed-off-by: Christoph Hel

Re: [PATCH hmm] drm/amdkfd: fix a use after free race with mmu_notififer unregister

2019-08-04 Thread Kuehling, Felix
On 2019-08-02 16:07, Jason Gunthorpe wrote: > When using mmu_notififer_unregister_no_release() the caller must ensure > there is a SRCU synchronize before the mn memory is freed, otherwise use > after free races are possible, for instance: > > CPU0 CPU1 >

Re: [PATCH 1/8] drm/amdgpu: drop drmP.h in amdgpu_amdkfd_arcturus.c

2019-07-31 Thread Kuehling, Felix
On 2019-07-31 11:52 a.m., Alex Deucher wrote: > Unused. > > Signed-off-by: Alex Deucher The series is Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arct

Re: [PATCH 07/13] mm: remove the page_shift member from struct hmm_range

2019-07-31 Thread Kuehling, Felix
On 2019-07-30 1:51 a.m., Christoph Hellwig wrote: > All users pass PAGE_SIZE here, and if we wanted to support single > entries for huge pages we should really just add a HMM_FAULT_HUGEPAGE > flag instead that uses the huge page size instead of having the > caller calculate that size once, just for

Re: [PATCH 06/13] mm: remove superflous arguments from hmm_range_register

2019-07-31 Thread Kuehling, Felix
On 2019-07-30 1:51 a.m., Christoph Hellwig wrote: > The start, end and page_shift values are all saved in the range > structure, so we might as well use that for argument passing. > > Signed-off-by: Christoph Hellwig Reviewed-by: Felix Kuehling > --- > Documentation/vm/hmm.rst

Re: [PATCH 02/13] amdgpu: don't initialize range->list in amdgpu_hmm_init_range

2019-07-31 Thread Kuehling, Felix
On 2019-07-30 1:51 a.m., Christoph Hellwig wrote: > The list is used to add the range to another list as an entry in the > core hmm code, so there is no need to initialize it in a driver. I've seen code that uses list_empty to check whether a list head has been added to a list or not. For that to

Re: [PATCH 01/13] amdgpu: remove -EAGAIN handling for hmm_range_fault

2019-07-31 Thread Kuehling, Felix
On 2019-07-30 1:51 a.m., Christoph Hellwig wrote: > hmm_range_fault can only return -EAGAIN if called with the block > argument set to false, so remove the special handling for it. The block argument no longer exists. You replaced that with the HMM_FAULT_ALLOW_RETRY with opposite logic. So this s

[PATCH 1/4] drm/amdgpu: Add flag to wipe VRAM on release

2019-07-19 Thread Kuehling, Felix
This memory allocation flag will be used to indicate BOs containing sensitive data that should not be leaked to other processes. Signed-off-by: Felix Kuehling --- include/uapi/drm/amdgpu_drm.h | 4 1 file changed, 4 insertions(+) diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/dr

[PATCH 3/4] drm/ttm: Add release_notify callback to ttm_bo_driver

2019-07-19 Thread Kuehling, Felix
This notifies the driver that a BO is about to be released. Releasing a BO also invokes the move_notify callback from ttm_bo_cleanup_memtype_use, but that happens too late for anything that would add fences to the BO and require a delayed delete. Signed-off-by: Felix Kuehling --- drivers/gpu/dr

[PATCH 4/4] drm/amdgpu: Implement VRAM wipe on release

2019-07-19 Thread Kuehling, Felix
Wipe VRAM memory containing sensitive data when moving or releasing BOs. Clearing the memory is pipelined to minimize any impact on subsequent memory allocation latency. Use of a poison value should help debug future use-after-free bugs. When moving BOs, the existing ttm_bo_pipelined_move ensures

[PATCH 2/4] drm/amdgpu: Mark KFD VRAM allocations for wipe on release

2019-07-19 Thread Kuehling, Felix
Memory used by KFD applications can contain sensitive information that should not be leaked to other processes. The current approach to prevent leaks is to clear VRAM at allocation time. This is not effective because memory can be reused in other ways without being cleared. Synchronously clearing m

Re: [PATCH 3/5] drm/ttm: Add release_notify callback to ttm_bo_driver

2019-07-10 Thread Kuehling, Felix
[adding dri-devel] On 2019-07-09 11:59 p.m., Kuehling, Felix wrote: > This notifies the driver that a BO is about to be released. > > Releasing a BO also invokes the move_notify callback from > ttm_bo_cleanup_memtype_use, but that happens too late for anything > that would add f

Re: [PATCH 1/1] drm/amdgpu: adopt to hmm_range_register API change

2019-07-08 Thread Kuehling, Felix
On 2019-07-07 7:30 p.m., Stephen Rothwell wrote: > Hi all, > > On Wed, 3 Jul 2019 17:09:16 -0400 Alex Deucher wrote: >> On Wed, Jul 3, 2019 at 5:03 PM Kuehling, Felix >> wrote: >>> On 2019-07-03 10:10 a.m., Jason Gunthorpe wrote: >>>> On Wed, Jul 03,

Re: [PATCH v2] MAINTAINERS: update amdkfd maintainer

2019-07-04 Thread Kuehling, Felix
On 2019-07-04 2:32 a.m., Oded Gabbay wrote: > I'm leaving the role of amdkfd maintainer. Therefore, update the relevant > entry in the MAINTAINERS file with the name of the new maintainer. > > Good Luck! Thank you Oded! Thanks for being the maintainer even after leaving AMD and helping me transit

Re: [PATCH 1/1] drm/amdgpu: adopt to hmm_range_register API change

2019-07-03 Thread Kuehling, Felix
On 2019-07-03 10:10 a.m., Jason Gunthorpe wrote: > On Wed, Jul 03, 2019 at 01:55:08AM +0000, Kuehling, Felix wrote: >> From: Philip Yang >> >> In order to pass mirror instead of mm to hmm_range_register, we need >> pass bo instead of ttm to amdgpu_ttm_tt_get_user_page

Re: [RFC] mm/hmm: pass mmu_notifier_range to sync_cpu_device_pagetables

2019-07-02 Thread Kuehling, Felix
On 2019-07-02 6:59 p.m., Jason Gunthorpe wrote: > On Wed, Jul 03, 2019 at 12:49:12AM +0200, Christoph Hellwig wrote: >> On Tue, Jul 02, 2019 at 07:53:23PM +, Jason Gunthorpe wrote: I'm sending this out now since we are updating many of the HMM APIs and I think it will be useful. >>> T

[PATCH 1/1] drm/amdgpu: adopt to hmm_range_register API change

2019-07-02 Thread Kuehling, Felix
From: Philip Yang In order to pass mirror instead of mm to hmm_range_register, we need pass bo instead of ttm to amdgpu_ttm_tt_get_user_pages because mirror is part of amdgpu_mn structure, which is accessible from bo. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Felix

Re: [PATCH 19/22] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}

2019-07-02 Thread Kuehling, Felix
On 2019-07-01 2:20 a.m., Christoph Hellwig wrote: > We should not have two different error codes for the same condition. In > addition this really complicates the code due to the special handling of > EAGAIN that drops the mmap_sem due to the FAULT_FLAG_ALLOW_RETRY logic > in the core vm. I think

Re: [PATCH] drm/amdkfd: fix potential null pointer dereference on pointer peer_dev

2019-07-02 Thread Kuehling, Felix
I think this could happen if KFD initialization fails for a device. Currently we'd add the device, and then remove it again. That may leave a gap in the proximity domains. Oak just had a fix recently to clean that up by only adding KFD devices to the topology after successful initialization. R

Re: [PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails

2019-06-26 Thread Kuehling, Felix
On 2019-06-26 2:54 a.m., Koenig, Christian wrote: > Am 26.06.19 um 08:40 schrieb Kuehling, Felix: >> Returning -EAGAIN prevents ttm_bo_mem_space from trying alternate >> placements and can lead to live-locks in amdgpu_cs, retrying >> indefinitely and never succeeding. >

[PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails

2019-06-25 Thread Kuehling, Felix
Returning -EAGAIN prevents ttm_bo_mem_space from trying alternate placements and can lead to live-locks in amdgpu_cs, retrying indefinitely and never succeeding. Fixes: cfcc52e477e4 ("drm/ttm: fix busy memory to fail other user v10") CC: Christian Koenig Signed-off-by: Felix Kuehling --- driver

Re: [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10

2019-06-25 Thread Kuehling, Felix
I believe I found a live-lock due to this patch when running our KFD eviction test in a loop. I pretty reliably hangs on the second loop iteration. If I revert this patch, the problem disappears. With some added instrumentation, I see that amdgpu_cs_list_validate in amdgpu_cs_parser_bos returns

Re: [PATCH v3 hmm 11/12] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-18 Thread Kuehling, Felix
On 2019-06-18 1:37, Christoph Hellwig wrote: > On Mon, Jun 17, 2019 at 09:45:09PM -0300, Jason Gunthorpe wrote: >> Am I looking at the wrong thing? Looks like it calls it through a work >> queue should should be OK.. > Yes, it calls it through a work queue. I guess that is fine because > it needs

Re: [PATCH v2 hmm 00/11] Various revisions from a locking/code review

2019-06-12 Thread Kuehling, Felix
[+Philip] Hi Jason, I'm out of the office this week. Hi Philip, can you give this a go? Not sure how much you've been following this patch series review. Message or call me on Skype to discuss any questions. Thanks,   Felix On 2019-06-11 12:48, Jason Gunthorpe wrote: > On Thu, Jun 06, 2019

Re: [PATCH 0/2] Two bug-fixes for HMM

2019-06-06 Thread Kuehling, Felix
[resent with correct address for Alex] On 2019-06-06 11:11 a.m., Jason Gunthorpe wrote: > On Fri, May 10, 2019 at 07:53:21PM +0000, Kuehling, Felix wrote: >> These problems were found in AMD-internal testing as we're working on >> adopting HMM. They are rebased against gli

Re: [PATCH 0/2] Two bug-fixes for HMM

2019-06-06 Thread Kuehling, Felix
On 2019-06-06 11:11 a.m., Jason Gunthorpe wrote: > On Fri, May 10, 2019 at 07:53:21PM +0000, Kuehling, Felix wrote: >> These problems were found in AMD-internal testing as we're working on >> adopting HMM. They are rebased against glisse/hmm-5.2-v3. We'd like to get >

Re: [PATCH] drm/ttm: fix ttm_bo_unreserve

2019-06-05 Thread Kuehling, Felix
On 2019-06-05 9:56, Michel Dänzer wrote: > On 2019-06-05 1:24 p.m., Christian König wrote: >> Am 04.06.19 um 21:03 schrieb Zeng, Oak: >>> From: amd-gfx On Behalf Of >>> Kuehling, Felix >>> On 2019-06-04 11:23, Christian König wrote: [snip] >>> --

Re: [PATCH] drm/ttm: fix ttm_bo_unreserve

2019-06-04 Thread Kuehling, Felix
On 2019-06-04 11:23, Christian König wrote: > Since we now keep BOs on the LRU we need to make sure > that they are removed when they are pinned. > > Signed-off-by: Christian König > --- > include/drm/ttm/ttm_bo_driver.h | 14 ++ > 1 file changed, 6 insertions(+), 8 deletions(-) >

Re: [PATCH][next] drm/amdkfd: fix null pointer dereference on dev

2019-05-29 Thread Kuehling, Felix
On 2019-05-29 11:07 a.m., Colin King wrote: > From: Colin Ian King > > The pointer dev is set to null yet it is being dereferenced when > checking dev->dqm->sched_policy. Fix this by performing the check > on dev->dqm->sched_policy after dev has been assigned and null > checked. Also remove the

Re: [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3

2019-05-24 Thread Kuehling, Felix
BOs when there is nothing easier to evict. ROCm applications like to use lots of memory. So it probably makes sense for us to stop removing our BOs from the LRU as well while we mass-validate our BOs in amdgpu_amdkfd_gpuvm_restore_process_bos. Regards,   Felix > > Christian. > > Am 22.05

Re: [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3

2019-05-22 Thread Kuehling, Felix
Can you explain how this avoids OOM situations? When is it safe to leave a reserved BO on the LRU list? Could we do the same thing in amdgpu_amdkfd_gpuvm.c? And if we did, what would be the expected side effects or consequences? Thanks,   Felix On 2019-05-22 8:59 a.m., Christian König wrote:

Re: [PATCH 1/2] mm/hmm: support automatic NUMA balancing

2019-05-14 Thread Kuehling, Felix
On 2019-05-13 5:27 p.m., Andrew Morton wrote: > [CAUTION: External Email] > > On Fri, 10 May 2019 19:53:23 + "Kuehling, Felix" > wrote: > >> From: Philip Yang >> >> While the page is migrating by NUMA balancing, HMM failed to detect this

Re: [PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-14 Thread Kuehling, Felix
0ea534db > which did not have a clear line of sight for 5.2 either. When was that? I saw "Use HMM for userptr" in Dave's 5.2-rc1 pull request to Linus. Regards,   Felix > > Alex > ---- > *From:* amd

Re: [PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-13 Thread Kuehling, Felix
[Fixed Alex's email address, sorry for getting it wrong first] On 2019-05-13 3:49 p.m., Jerome Glisse wrote: > [CAUTION: External Email] > > Andrew can we get this 2 fixes line up for 5.2 ? > > On Mon, May 13, 2019 at 07:36:44PM +0000, Kuehling, Felix wrote: >> Hi Jerom

Re: [PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-13 Thread Kuehling, Felix
y 10, 2019 at 07:53:24PM +, Kuehling, Felix wrote: >> Don't set this flag by default in hmm_vma_do_fault. It is set >> conditionally just a few lines below. Setting it unconditionally >> can lead to handle_mm_fault doing a non-blocking fault, returning >> -EBUSY and

[PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-10 Thread Kuehling, Felix
Don't set this flag by default in hmm_vma_do_fault. It is set conditionally just a few lines below. Setting it unconditionally can lead to handle_mm_fault doing a non-blocking fault, returning -EBUSY and unlocking mmap_sem unexpectedly. Signed-off-by: Felix Kuehling --- mm/hmm.c | 2 +- 1 file c

[PATCH 0/2] Two bug-fixes for HMM

2019-05-10 Thread Kuehling, Felix
These problems were found in AMD-internal testing as we're working on adopting HMM. They are rebased against glisse/hmm-5.2-v3. We'd like to get them applied to a mainline Linux kernel as well as drm-next and amd-staging-drm-next sooner rather than later. Currently the HMM in amd-staging-drm-next

[PATCH 1/2] mm/hmm: support automatic NUMA balancing

2019-05-10 Thread Kuehling, Felix
From: Philip Yang While the page is migrating by NUMA balancing, HMM failed to detect this condition and still return the old page. Application will use the new page migrated, but driver pass the old page physical address to GPU, this crash the application later. Use pte_protnone(pte) to return

Re: [PATCH v15 11/17] drm/amdgpu, arm64: untag user pointers

2019-05-07 Thread Kuehling, Felix
s patch untag user pointers in > amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_ > alloc_memory_of_gpu() for the KFD case. This also makes sure that an > untagged pointer is passed to amdgpu_ttm_tt_get_user_pages(), which uses > it for vma lookups. > > Suggested-by: Kuehling, Fel

Re: [PATCH v15 12/17] drm/radeon, arm64: untag user pointers in radeon_gem_userptr_ioctl

2019-05-07 Thread Kuehling, Felix
On 2019-05-06 12:30 p.m., Andrey Konovalov wrote: > [CAUTION: External Email] > > This patch is a part of a series that extends arm64 kernel ABI to allow to > pass tagged user pointers (with the top byte set to something else other > than 0x00) as syscall arguments. > > In radeon_gem_userptr_ioctl(

Re: [PATCH v14 11/17] drm/amdgpu, arm64: untag user pointers

2019-04-30 Thread Kuehling, Felix
; amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu() for the KFD case. > > Suggested-by: Kuehling, Felix > Signed-off-by: Andrey Konovalov > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++ > drivers/gpu/drm/a

Re: [PATCH v14 12/17] drm/radeon, arm64: untag user pointers

2019-04-30 Thread Kuehling, Felix
On 2019-04-30 9:25 a.m., Andrey Konovalov wrote: > [CAUTION: External Email] > > This patch is a part of a series that extends arm64 kernel ABI to allow to > pass tagged user pointers (with the top byte set to something else other > than 0x00) as syscall arguments. > > radeon_ttm_tt_pin_userptr() u

Re: [PATCH] drm: increase drm mmap_range size to 1TB

2019-04-17 Thread Kuehling, Felix
Adding dri-devel On 2019-04-17 6:15 p.m., Yang, Philip wrote: > After patch "drm: Use the same mmap-range offset and size for GEM and > TTM", application failed to create bo of system memory because drm > mmap_range size decrease to 64GB from original 1TB. This is not big > enough for applications

Re: [radeon-alex:drm-next-5.2-wip 21/42] drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:756:32: error: 'HMM_PFN_VALID' undeclared; did you mean '_PAGE_VALID'?

2019-04-03 Thread Kuehling, Felix
[dropping the robot] I think Philip fixed those issues on amd-staging-drm-next. Either some fixes are missing on drm-next-5.2-wip, or they are there but should be squashed to avoid hitting these errors on intermediate builds. Regards,   Felix On 2019-04-03 2:26 p.m., kbuild test robot wrote:

Re: [PATCH v13 14/20] drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages

2019-04-02 Thread Kuehling, Felix
On 2019-04-02 10:37 a.m., Andrey Konovalov wrote: > On Mon, Mar 25, 2019 at 11:21 PM Kuehling, Felix > wrote: >> On 2019-03-20 10:51 a.m., Andrey Konovalov wrote: >>> This patch is a part of a series that extends arm64 kernel ABI to allow to >>> pass tagged user p

Re: [PATCH RFC tip/core/rcu 3/4] drivers/gpu/drm/amd: Dynamically allocate kfd_processes_srcu

2019-04-02 Thread Kuehling, Felix
On 2019-04-02 10:29 a.m., Paul E. McKenney wrote: > Having DEFINE_SRCU() or DEFINE_STATIC_SRCU() in a loadable module > requires that the size of the reserved region be increased, which is > not something we really want to be doing. This commit therefore removes > the DEFINE_STATIC_SRCU() from dri

Re: [PATCH v13 14/20] drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages

2019-03-25 Thread Kuehling, Felix
On 2019-03-20 10:51 a.m., Andrey Konovalov wrote: > This patch is a part of a series that extends arm64 kernel ABI to allow to > pass tagged user pointers (with the top byte set to something else other > than 0x00) as syscall arguments. > > amdgpu_ttm_tt_get_user_pages() uses provided user pointers

Re: [PATCH] drm/amdkfd: Fix unchecked return value

2019-03-18 Thread Kuehling, Felix
Alex already applied an equivalent patch by Colin King (attached for reference). Regards,   Felix On 3/18/2019 2:05 PM, Gustavo A. R. Silva wrote: > Assign return value of function amdgpu_bo_sync_wait() to variable ret > for its further check. > > Addresses-Coverity-ID: 1443914 ("Logically dead

Re: [PATCH 1/1] drm/ttm: Account for kernel allocations in kernel zone only

2019-02-25 Thread Kuehling, Felix
On 2/25/2019 2:58 PM, Thomas Hellstrom wrote: > On Mon, 2019-02-25 at 14:20 +, Koenig, Christian wrote: >> Am 23.02.19 um 00:19 schrieb Kuehling, Felix: >>> Don't account for them in other zones such as dma32. The kernel >>> page >>> allocator has its o

[PATCH 1/1] drm/ttm: Account for kernel allocations in kernel zone only

2019-02-22 Thread Kuehling, Felix
Don't account for them in other zones such as dma32. The kernel page allocator has its own heuristics to avoid exhausting special zones for regular kernel allocations. Signed-off-by: Felix Kuehling CC: thellst...@vmware.com CC: christian.koe...@amd.com --- drivers/gpu/drm/ttm/ttm_memory.c | 6 ++

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-22 Thread Kuehling, Felix
On 2019-02-22 8:45 a.m., Thomas Hellstrom wrote: > On Fri, 2019-02-22 at 07:10 +, Koenig, Christian wrote: >> Am 21.02.19 um 22:02 schrieb Thomas Hellstrom: >>> Hi, >>> >>> On Thu, 2019-02-21 at 20:24 +, Kuehling, Felix wrote: >>>> On 2019-02

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-21 Thread Kuehling, Felix
On 2019-02-21 12:34 p.m., Thomas Hellstrom wrote: > On Thu, 2019-02-21 at 16:57 +0000, Kuehling, Felix wrote: >> On 2019-02-21 2:59 a.m., Koenig, Christian wrote: >>> On x86 with HIGHMEM there is no dma32 zone. Why do we need one on >>>>> x86_64? Can we make

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-21 Thread Kuehling, Felix
On 2019-02-21 2:59 a.m., Koenig, Christian wrote: > On x86 with HIGHMEM there is no dma32 zone. Why do we need one on >>> x86_64? Can we make x86_64 more like HIGHMEM instead? >>> >>> Regards, >>> Felix >>> >> IIRC with x86, the kernel zone is always smaller than any dma32 zone, >> so we'd al

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Kuehling, Felix
On 2019-02-20 1:41 a.m., Thomas Hellstrom wrote: > On Tue, 2019-02-19 at 17:06 +0000, Kuehling, Felix wrote: >> On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote: >>> On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote: >>>> Am 18.02.19 um 10:47 schrieb Thomas He

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-19 Thread Kuehling, Felix
> default, >>>>> which >>>>> means if we drop this check, other devices may stop functioning >>>>> unexpectedly? >>>>> >>>>> However, in the end I'd expect the kernel page allocation >>>>> system >>&

[PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-15 Thread Kuehling, Felix
This is an RFC. I'm not sure this is the right solution, but it highlights the problem I'm trying to solve. The dma32_zone limits the acc_size of all allocated BOs to 2GB. On a 64-bit system with hundreds of GB of system memory and GPU memory, this can become a bottle neck. We're seeing TTM memory

Re: [PATCH] drm/amdkfd: Fix if preprocessor statement above kfd_fill_iolink_info_for_cpu

2019-01-31 Thread Kuehling, Felix
Thank you, Nathan. I applied your patch to amd-staging-drm-next. Sorry for the late response. I'm catching up with my email backlog after a vacation. Regards,   Felix On 2019-01-21 6:52 p.m., Nathan Chancellor wrote: > Clang warns: > > drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:866:5: war

Re: [PATCH] drm/amdgpu_vm: fix boolean expressions

2019-01-03 Thread Kuehling, Felix
On 2019-01-03 12:34 p.m., Gustavo A. R. Silva wrote: > Fix boolean expressions by using logical AND operator '&&' > instead of bitwise operator '&'. > > This issue was detected with the help of Coccinelle. > > Fixes: c8c5e569c5b0 ("drm/amdgpu: Consolidate visible vs. real vram check > v2.") Actual

Re: [PATCH v2 1/3] mm/mmu_notifier: use structure for invalidate_range_start/end callback

2018-12-05 Thread Kuehling, Felix
On 2018-12-05 6:04 p.m., Jerome Glisse wrote: > On Wed, Dec 05, 2018 at 09:42:45PM +0000, Kuehling, Felix wrote: >> The amdgpu part looks good to me. >> >> A minor nit-pick in mmu_notifier.c (inline). >> >> Either way, the series is Acked-by: Felix Kuehling >

Re: [PATCH v2 1/3] mm/mmu_notifier: use structure for invalidate_range_start/end callback

2018-12-05 Thread Kuehling, Felix
The amdgpu part looks good to me. A minor nit-pick in mmu_notifier.c (inline). Either way, the series is Acked-by: Felix Kuehling On 2018-12-05 12:36 a.m., jgli...@redhat.com wrote: > From: Jérôme Glisse > > To avoid having to change many callback definition everytime we want > to add a parame

Re: [Intel-gfx] [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-12-03 Thread Kuehling, Felix
On 2018-11-28 4:14 a.m., Joonas Lahtinen wrote: > Quoting Ho, Kenny (2018-11-27 17:41:17) >> On Tue, Nov 27, 2018 at 4:46 AM Joonas Lahtinen >> wrote: >>> I think a more abstract property "% of GPU (processing power)" might >>> be a more universal approach. One can then implement that through >>

Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.

2018-11-22 Thread Kuehling, Felix
On 2018-10-22 1:23 p.m., Arun KS wrote: > Remove managed_page_count_lock spinlock and instead use atomic > variables. > > Suggested-by: Michal Hocko > Suggested-by: Vlastimil Babka > Signed-off-by: Arun KS Acked-by: Felix Kuehling Regards,   Felix > > --- > As discussed here, > https://patch

RE: [PATCH] drm/amdgpu: Fix Kernel Oops triggered by kfdtest

2018-11-15 Thread Kuehling, Felix
Apologies. We already have a fix for this on our internal amd-kfd-staging branch, but it's missing from amd-staging-drm-next. I'll cherry-pick our fix to amd-staging-drm-next and nominate it for drm-fixes. Regards, Felix -Original Message- From: amd-gfx On Behalf Of Joerg Roedel Sent

Re: [PATCH 2/2] uapi: fix more linux/kfd_ioctl.h userspace compilation errors

2018-11-02 Thread Kuehling, Felix
On 2018-11-01 7:03 a.m., Dmitry V. Levin wrote: > Consistently use types provided by via > to fix struct kfd_ioctl_get_queue_wave_state_args userspace compilation > errors. > > Fixes: 5df099e8bc83f ("drm/amdkfd: Add wavefront context save state retrieval > ioctl") > Signed-off-by: Dmitry V. Lev

Re: [PATCH] drm/amdgpu: fix a missing-check bug

2018-10-22 Thread Kuehling, Felix
The BIOS signature check does not guarantee integrity of the BIOS image either way. As I understand it, the signature is just a magic number. It's not a cryptographic signature. The check is just a sanity check. Therefore this change doesn't add any meaningful protection against the scenario you de