On Tue, Aug 06, 2024 at 04:09:43PM +0200, Maxime Ripard wrote:
> On Tue, Aug 06, 2024 at 03:01:44PM GMT, Daniel Vetter wrote:
> > On Mon, Jul 01, 2024 at 06:01:41PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 01/07/2024 10:25, Maarten Lankhorst wrote:
> > > > Den 2024-06-28 kl. 16:04, skrev Maxime Ripard:
> > > > > Hi,
> > > > > 
> > > > > On Thu, Jun 27, 2024 at 09:22:56PM GMT, Maarten Lankhorst wrote:
> > > > > > Den 2024-06-27 kl. 19:16, skrev Maxime Ripard:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Thanks for working on this!
> > > > > > > 
> > > > > > > On Thu, Jun 27, 2024 at 05:47:21PM GMT, Maarten Lankhorst wrote:
> > > > > > > > The initial version was based roughly on the rdma and misc 
> > > > > > > > cgroup
> > > > > > > > controllers, with a lot of the accounting code borrowed from 
> > > > > > > > rdma.
> > > > > > > > 
> > > > > > > > The current version is a complete rewrite with page counter; it 
> > > > > > > > uses
> > > > > > > > the same min/low/max semantics as the memory cgroup as a result.
> > > > > > > > 
> > > > > > > > There's a small mismatch as TTM uses u64, and page_counter long 
> > > > > > > > pages.
> > > > > > > > In practice it's not a problem. 32-bits systems don't really 
> > > > > > > > come with
> > > > > > > > > =4GB cards and as long as we're consistently wrong with 
> > > > > > > > > units, it's
> > > > > > > > fine. The device page size may not be in the same units as 
> > > > > > > > kernel page
> > > > > > > > size, and each region might also have a different page size 
> > > > > > > > (VRAM vs GART
> > > > > > > > for example).
> > > > > > > > 
> > > > > > > > The interface is simple:
> > > > > > > > - populate drmcgroup_device->regions[..] name and size for each 
> > > > > > > > active
> > > > > > > >     region, set num_regions accordingly.
> > > > > > > > - Call drm(m)cg_register_device()
> > > > > > > > - Use drmcg_try_charge to check if you can allocate a chunk of 
> > > > > > > > memory,
> > > > > > > >     use drmcg_uncharge when freeing it. This may return an 
> > > > > > > > error code,
> > > > > > > >     or -EAGAIN when the cgroup limit is reached. In that case a 
> > > > > > > > reference
> > > > > > > >     to the limiting pool is returned.
> > > > > > > > - The limiting cs can be used as compare function for
> > > > > > > >     drmcs_evict_valuable.
> > > > > > > > - After having evicted enough, drop reference to limiting cs 
> > > > > > > > with
> > > > > > > >     drmcs_pool_put.
> > > > > > > > 
> > > > > > > > This API allows you to limit device resources with cgroups.
> > > > > > > > You can see the supported cards in /sys/fs/cgroup/drm.capacity
> > > > > > > > You need to echo +drm to cgroup.subtree_control, and then you 
> > > > > > > > can
> > > > > > > > partition memory.
> > > > > > > > 
> > > > > > > > Signed-off-by: Maarten 
> > > > > > > > Lankhorst<maarten.lankho...@linux.intel.com>
> > > > > > > > Co-developed-by: Friedrich Vock<friedrich.v...@gmx.de>
> > > > > > > I'm sorry, I should have wrote minutes on the discussion we had 
> > > > > > > with TJ
> > > > > > > and Tvrtko the other day.
> > > > > > > 
> > > > > > > We're all very interested in making this happen, but doing a "DRM"
> > > > > > > cgroup doesn't look like the right path to us.
> > > > > > > 
> > > > > > > Indeed, we have a significant number of drivers that won't have a
> > > > > > > dedicated memory but will depend on DMA allocations one way or the
> > > > > > > other, and those pools are shared between multiple frameworks 
> > > > > > > (DRM,
> > > > > > > V4L2, DMA-Buf Heaps, at least).
> > > > > > > 
> > > > > > > This was also pointed out by Sima some time ago here:
> > > > > > > https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
> > > > > > > 
> > > > > > > So we'll want that cgroup subsystem to be cross-framework. We 
> > > > > > > settled on
> > > > > > > a "device" cgroup during the discussion, but I'm sure we'll have 
> > > > > > > plenty
> > > > > > > of bikeshedding.
> > > > > > > 
> > > > > > > The other thing we agreed on, based on the feedback TJ got on the 
> > > > > > > last
> > > > > > > iterations of his series was to go for memcg for drivers not 
> > > > > > > using DMA
> > > > > > > allocations.
> > > > > > > 
> > > > > > > It's the part where I expect some discussion there too :)
> > > > > > > 
> > > > > > > So we went back to a previous version of TJ's work, and I've 
> > > > > > > started to
> > > > > > > work on:
> > > > > > > 
> > > > > > >     - Integration of the cgroup in the GEM DMA and GEM VRAM 
> > > > > > > helpers (this
> > > > > > >       works on tidss right now)
> > > > > > > 
> > > > > > >     - Integration of all heaps into that cgroup but the system one
> > > > > > >       (working on this at the moment)
> > > > > > 
> > > > > > Should be similar to what I have then. I think you could use my 
> > > > > > work to
> > > > > > continue it.
> > > > > > 
> > > > > > I made nothing DRM specific except the name, if you renamed it the 
> > > > > > device
> > > > > > resource management cgroup and changed the init function signature 
> > > > > > to take a
> > > > > > name instead of a drm pointer, nothing would change. This is 
> > > > > > exactly what
> > > > > > I'm hoping to accomplish, including reserving memory.
> > > > > 
> > > > > I've started to work on rebasing my current work onto your series 
> > > > > today,
> > > > > and I'm not entirely sure how what I described would best fit. Let's
> > > > > assume we have two KMS device, one using shmem, one using DMA
> > > > > allocations, two heaps, one using the page allocator, the other using
> > > > > CMA, and one v4l2 device using dma allocations.
> > > > > 
> > > > > So we would have one KMS device and one heap using the page allocator,
> > > > > and one KMS device, one heap, and one v4l2 driver using the DMA
> > > > > allocator.
> > > > > 
> > > > > Would these make different cgroup devices, or different cgroup 
> > > > > regions?
> > > > 
> > > > Each driver would register a device, whatever feels most logical for 
> > > > that device I suppose.
> > > > 
> > > > My guess is that a prefix would also be nice here, so register a device 
> > > > with name of drm/$name or v4l2/$name, heap/$name. I didn't give it much 
> > > > thought and we're still experimenting, so just try something. :)
> > > > 
> > > > There's no limit to amount of devices, I only fixed amount of pools to 
> > > > match TTM, but even that could be increased arbitrarily. I just don't 
> > > > think there is a point in doing so.
> > > 
> > > Do we need a plan for top level controls which do not include region 
> > > names?
> > > If the latter will be driver specific then I am thinking of ease of
> > > configuring it all from the outside. Especially considering that one 
> > > cgroup
> > > can have multiple devices in it.
> > > 
> > > Second question is about double accounting for shmem backed objects. I 
> > > think
> > > they will be seen, for drivers which allocate backing store at buffer
> > > objects creation time, under the cgroup of process doing the creation, in
> > > the existing memory controller. Right?
> > 
> > We currently don't set __GFP_ACCOUNT respectively use GFP_KERNEL_ACCOUNT,
> > so no. Unless someone allocates them with GFP_USER ...
> > 
> > > Is there a chance to exclude those from there and only have them in this 
> > > new
> > > controller? Or would the opposite be a better choice? That is, not see 
> > > those
> > > in the device memory controller but only in the existing one.
> > 
> > I missed this, so jumping in super late. I think guidance from Tejun was
> > to go the other way around: Exclude allocations from normal system
> > memory from device cgroups and instead make sure it's tracked in the
> > existing memcg.
> > 
> > Which might mean we need memcg shrinkers and the assorted pain ...
> > 
> > Also I don't think we ever reached some agreement on where things like cma
> > allocations should be accounted for in this case.
> 
> Yeah, but that's the thing, memcg probably won't cut it for CMA. Because
> if you pull the thread, that means that dma-heaps also have to register
> their buffers into memcg too, even if it's backed by something else than
> RAM.

For cma I'm kinda leaning towards "both". If you don't have a special cma
cgroup and just memcg, you can exhaust the cma easily. But if the cma
allocations also aren't tracked in memcg, you have a blind spot there,
which isn't great.

> This is what this cgroup controller is meant to do: memcg for memory
> (GFP'd) buffers, this cgroup for everything else.

Yeah if there's no way you can get it through alloc_pages() it definitely
shouldn't be in memcg.
-Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Reply via email to