RE: [PATCH 5/7] drm/amdgpu: add mmhub ras_late_init callback function

2019-08-28 Thread Zhou1, Tao
> -Original Message- > From: Hawking Zhang > Sent: 2019年8月28日 21:03 > To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ; > Deucher, Alexander > Cc: Zhang, Hawking > Subject: [PATCH 5/7] drm/amdgpu: add mmhub ras_late_init callback > function > > The function will be called in late init p

RE: [PATCH 6/7] drm/amdgpu: add ras_late_init callback function for nbio v7_4

2019-08-28 Thread Zhou1, Tao
> -Original Message- > From: Hawking Zhang > Sent: 2019年8月28日 21:03 > To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ; > Deucher, Alexander > Cc: Zhang, Hawking > Subject: [PATCH 6/7] drm/amdgpu: add ras_late_init callback function for > nbio v7_4 > > ras_late_init callback function wi

RE: [PATCH 1/7] drm/amdgpu: add helper function to do common ras_late_init

2019-08-28 Thread Zhou1, Tao
Another way is to add check for ih_info in amdgpu_ras_interrupt_add_handler and amdgpu_ras_interrupt_remove_handler directly. > -Original Message- > From: amd-gfx On Behalf Of > Zhou1, Tao > Sent: 2019年8月29日 10:59 > To: Zhang, Hawking ; amd- > g...@lists.freedesktop.org; Deucher, Alexand

RE: [PATCH 5/7] drm/amdgpu: add mmhub ras_late_init callback function

2019-08-28 Thread Zhou1, Tao
Can we also add a ras_late_init for umc? > -Original Message- > From: amd-gfx On Behalf Of > Zhou1, Tao > Sent: 2019年8月29日 11:41 > To: Zhang, Hawking ; amd- > g...@lists.freedesktop.org; Deucher, Alexander > > Cc: Zhang, Hawking > Subject: RE: [PATCH 5/7] drm/amdgpu: add mmhub ras_late_

Re: [PATCH 14/17] drm/amd/display: Isolate DSC module from driver dependencies

2019-08-28 Thread Dave Airlie
On Thu, 29 Aug 2019 at 07:04, Bhawanpreet Lakha wrote: > > From: Bayan Zabihiyan > > [Why] > Edid Utility wishes to include DSC module from driver instead > of doing it's own logic which will need to be updated every time > someone modifies the driver logic. > > [How] > Modify some functions such

RE: [PATCH 1/7] drm/amdgpu: add helper function to do common ras_late_init

2019-08-28 Thread Zhang, Hawking
Good point, I think we can check ih_info.cb, instead of ras_block, as the check condition. On the other hand, I initialized the header in ih_info in case someone use it in somewhere... Regards, Hawking -Original Message- From: Zhou1, Tao Sent: 2019年8月29日 11:52 To: Zhou1, Tao ; Zhang, H

RE: [PATCH 7/7] drm/amdgpu: switch to ras_late_init callback for nbio v7_4

2019-08-28 Thread Zhang, Hawking
Good catch. Will update it in v2. Regards, Hawking -Original Message- From: Chen, Guchun Sent: 2019年8月29日 9:25 To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org; Zhou1, Tao ; Deucher, Alexander Cc: Zhang, Hawking Subject: RE: [PATCH 7/7] drm/amdgpu: switch to ras_late_init callback

RE: [PATCH 6/7] drm/amdgpu: add ras_late_init callback function for nbio v7_4

2019-08-28 Thread Zhang, Hawking
RE - [Tao] The ras block name is AMDGPU_RAS_BLOCK_PCIE_BIF and its string name is pcie_bif in ras_block_string, QA may be confused in the future. I have no strong opinion on the naming. but it's good to align with the block string to avoid confusing. Will update in v2. Regards, Hawking -Ori

Re: [PATCH v2] drm/amdgpu: Default disable GDS for compute+gfx

2019-08-28 Thread zhoucm1
On 2019/8/29 上午1:08, Marek Olšák wrote: It can't break an older driver, because there is no older driver that requires the static allocation. Note that closed source drivers don't count, because they don't need backward compatibility. Yes, I agree, we don't need take care of closed source s

[PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-08-28 Thread Kenny Ho
To allow other subsystems to iterate through all stored DRM minors and act upon them. Also exposes drm_minor_acquire and drm_minor_release for other subsystem to handle drm_minor. DRM cgroup controller is the initial consumer of this new features. Change-Id: I7c4b67ce6b31f06d1037b03435386ff5b814

[PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit

2019-08-28 Thread Kenny Ho
The drm resource being limited here is the GEM buffer objects. User applications allocate and free these buffers. In addition, a process can allocate a buffer and share it with another process. The consumer of a shared buffer can also outlive the allocator of the buffer. For the purpose of cgro

[PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-08-28 Thread Kenny Ho
This is a follow up to the RFC I made previously to introduce a cgroup controller for the GPU/DRM subsystem [v1,v2,v3]. The goal is to be able to provide resource management to GPU resources using things like container. With this RFC v4, I am hoping to have some consensus on a merge plan. I be

[PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem

2019-08-28 Thread Kenny Ho
With the increased importance of machine learning, data science and other cloud-based applications, GPUs are already in production use in data centers today. Existing GPU resource management is very coarse grain, however, as sysadmins are only able to distribute workload on a per-GPU basis. An al

[PATCH RFC v4 10/16] drm, cgroup: Add TTM buffer peak usage stats

2019-08-28 Thread Kenny Ho
drm.memory.peak.stats A read-only nested-keyed file which exists on all cgroups. Each entry is keyed by the drm device's major:minor. The following nested keys are defined. == == system Pea

[PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim

2019-08-28 Thread Kenny Ho
Allow DRM TTM memory manager to register a work_struct, such that, when a drmcgrp is under memory pressure, memory reclaiming can be triggered immediately. Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8 Signed-off-by: Kenny Ho --- drivers/gpu/drm/ttm/ttm_bo.c| 49 ++

[PATCH RFC v4 03/16] drm, cgroup: Initialize drmcg properties

2019-08-28 Thread Kenny Ho
drmcg initialization involves allocating a per cgroup, per device data structure and setting the defaults. There are two entry points for drmcg init: 1) When struct drmcg is created via css_alloc, initialization is done for each device 2) When DRM devices are created after drmcgs are created a

[PATCH RFC v4 06/16] drm, cgroup: Add GEM buffer allocation count stats

2019-08-28 Thread Kenny Ho
drm.buffer.count.stats A read-only flat-keyed file which exists on all cgroups. Each entry is keyed by the drm device's major:minor. Total number of GEM buffer allocated. Change-Id: Id3e1809d5fee8562e47a7d2b961688956d844ec6 Signed-off-by: Kenny Ho --- Documentation/admi

[PATCH RFC v4 12/16] drm, cgroup: Add soft VRAM limit

2019-08-28 Thread Kenny Ho
The drm resource being limited is the TTM (Translation Table Manager) buffers. TTM manages different types of memory that a GPU might access. These memory types include dedicated Video RAM (VRAM) and host/system memory accessible through IOMMU (GART/GTT). TTM is currently used by multiple drm dri

[PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup

2019-08-28 Thread Kenny Ho
The number of logical gpu (lgpu) is defined to be the number of compute unit (CU) for a device. The lgpu allocation limit only applies to compute workload for the moment (enforced via kfd queue creation.) Any cu_mask update is validated against the availability of the compute unit as defined by t

[PATCH RFC v4 09/16] drm, cgroup: Add TTM buffer allocation stats

2019-08-28 Thread Kenny Ho
The drm resource being measured is the TTM (Translation Table Manager) buffers. TTM manages different types of memory that a GPU might access. These memory types include dedicated Video RAM (VRAM) and host/system memory accessible through IOMMU (GART/GTT). TTM is currently used by multiple drm dr

[PATCH RFC v4 05/16] drm, cgroup: Add peak GEM buffer allocation stats

2019-08-28 Thread Kenny Ho
drm.buffer.peak.stats A read-only flat-keyed file which exists on all cgroups. Each entry is keyed by the drm device's major:minor. Largest (high water mark) GEM buffer allocated in bytes. Change-Id: I79e56222151a3d33a76a61ba0097fe93ebb3449f Signed-off-by: Kenny Ho ---

[PATCH RFC v4 15/16] drm, cgroup: add update trigger after limit change

2019-08-28 Thread Kenny Ho
Before this commit, drmcg limits are updated but enforcement is delayed until the next time the driver check against the new limit. While this is sufficient for certain resources, a more proactive enforcement may be needed for other resources. Introducing an optional drmcg_limit_updated callback

[PATCH RFC v4 08/16] drm, cgroup: Add peak GEM buffer allocation limit

2019-08-28 Thread Kenny Ho
drm.buffer.peak.default A read-only flat-keyed file which exists on the root cgroup. Each entry is keyed by the drm device's major:minor. Default limits on the largest GEM buffer allocation in bytes. drm.buffer.peak.max A read-write flat-keyed file which exists on

[PATCH RFC v4 04/16] drm, cgroup: Add total GEM buffer allocation stats

2019-08-28 Thread Kenny Ho
The drm resource being measured here is the GEM buffer objects. User applications allocate and free these buffers. In addition, a process can allocate a buffer and share it with another process. The consumer of a shared buffer can also outlive the allocator of the buffer. For the purpose of cgr

[PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-08-28 Thread Kenny Ho
drm.lgpu A read-write nested-keyed file which exists on all cgroups. Each entry is keyed by the DRM device's major:minor. lgpu stands for logical GPU, it is an abstraction used to subdivide a physical DRM device for the purpose of resource management.

[PATCH RFC v4 11/16] drm, cgroup: Add per cgroup bw measure and control

2019-08-28 Thread Kenny Ho
The bandwidth is measured by keeping track of the amount of bytes moved by ttm within a time period. We defined two type of bandwidth: burst and average. Average bandwidth is calculated by dividing the total amount of bytes moved within a cgroup by the lifetime of the cgroup. Burst bandwidth is s

<    1   2