Re: [PATCH] MAINTAINERS: Add entry for DMEM cgroup controller

2025-02-20 Thread Tejun Heo
all cgroup specific patches. > > Signed-off-by: Maarten Lankhorst > Acked-by: Maxime Ripard > Acked-by: Natalie Vock Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH v2] cgroup/dmem: Don't open-code css_for_each_descendant_pre

2025-02-18 Thread Tejun Heo
Hello, On Tue, Feb 18, 2025 at 03:55:43PM +0100, Maarten Lankhorst wrote: > Should this fix go through the cgroup tree? I haven't been routing any dmem patches. Might as well stick to drm tree? Thanks. -- tejun

Re: [PATCH next] cgroup/rdma: Drop bogus PAGE_COUNTER select

2025-01-16 Thread Tejun Heo
ux...@mail.gmail.com > Signed-off-by: Geert Uytterhoeven Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH 1/4] cgroup/dmem: Select PAGE_COUNTER

2025-01-14 Thread Tejun Heo
t, and I'm > not sure how we should merge these patches. > > Obviously, we'd need Tejun's, Johannes', or Michal's ack, but should we > backmerged drm-next into drm-misc-next-fixes and apply them there? Acked-by: Tejun Heo Please route them with the existing dmem patches. Thanks. -- tejun

Re: [PATCH -next] kernel/cgroup: Remove the unused variable climit

2025-01-14 Thread Tejun Heo
alculate_protection(limit_pool, test_pool); > > > > The dmem controller is actually pulled into the drm tree at the moment. > > > > cc relevant parties on how to handle this fix commit. > > We can either take it through drm with one of the cgroup maintainers > ack, or they can merge the PR in their tree and merge the fixes as they > wish through their tree. Acked-by: Tejun Heo Please route with the rest of dmem changes. Thanks. -- tejun

Re: [PATCH] workqueue: Do not warn when cancelling WQ_MEM_RECLAIM work from !WQ_MEM_RECLAIM worker

2024-12-19 Thread Tejun Heo
be running at all. Therefore > cancelling it is safe and we can relax the warning criteria by letting the > helper know of the calling context. > > Signed-off-by: Tvrtko Ursulin > Fixes: fca839c00a12 ("workqueue: warn if memory reclaim tries to flush > !WQ_MEM_RECLAIM wor

Re: [PATCH v2 0/7] kernel/cgroups: Add "dmem" memory accounting cgroup.

2024-12-17 Thread Tejun Heo
Hello, On Tue, Dec 17, 2024 at 06:37:22PM +0100, Maarten Lankhorst wrote: > Den 2024-12-17 kl. 18:11, skrev Tejun Heo: > > On Tue, Dec 17, 2024 at 03:28:50PM +0100, Maarten Lankhorst wrote: > > > Now that all patches look good, what is needed to merge the series? > > >

Re: [PATCH v2 0/7] kernel/cgroups: Add "dmem" memory accounting cgroup.

2024-12-17 Thread Tejun Heo
On Tue, Dec 17, 2024 at 03:28:50PM +0100, Maarten Lankhorst wrote: > Now that all patches look good, what is needed to merge the series? Without > patch 6/7 as it is a hack for testing. There were some questions raised about device naming. One thing we want to get right from the beginning is the b

Re: [PATCH 0/7] kernel/cgroups: Add "dev" memory accounting cgroup.

2024-11-13 Thread Tejun Heo
Hello, On Wed, Nov 13, 2024 at 03:58:25PM +0100, Maarten Lankhorst wrote: ... > Thanks for all feedback and discussion. I checked mostly on patchwork so I > missed the discussion here. Fortunately it's only been about naming. :) > > I'm thinking of adding a 'high' knob as well, that will work sim

Re: [PATCH 0/7] kernel/cgroups: Add "dev" memory accounting cgroup.

2024-11-06 Thread Tejun Heo
On Wed, Nov 06, 2024 at 11:31:49AM +0100, Maxime Ripard wrote: ... > > How about dmem for this one, and dpu for the other one. For device > > memory and device processing unit, respectively. > > dmem sounds great to me, does everyone agree? Sounds good to me. Thanks. -- tejun

Re: [PATCH 0/7] kernel/cgroups: Add "dev" memory accounting cgroup.

2024-10-25 Thread Tejun Heo
Hello, On Thu, Oct 24, 2024 at 09:20:43AM +0200, Maxime Ripard wrote: ... > > Yeah, let's not use "dev" name for this. As Waiman pointed out, it conflicts > > with the devices controller from cgroup1. While cgroup1 is mostly > > deprecated, the same features are provided through BPF in systemd usi

Re: [PATCH 0/7] kernel/cgroups: Add "dev" memory accounting cgroup.

2024-10-23 Thread Tejun Heo
Hello, On Wed, Oct 23, 2024 at 09:52:53AM +0200, Maarten Lankhorst wrote: > New submission! > I've added documentation for each call, and integrated the renaming from > drm cgroup to dev cgroup, based on maxime ripard's work. > > Maxime has been testing this with dma-buf heaps and v4l2 too, and i

Re: [PATCH v3 3/5] workqueue: Add interface for user-defined workqueue lockdep map

2024-08-13 Thread Tejun Heo
t; v2: > - Add alloc_workqueue_lockdep_map (Tejun) > v3: > - Drop __WQ_USER_OWNED_LOCKDEP (Tejun) > - static inline alloc_ordered_workqueue_lockdep_map (Tejun) > > Cc: Tejun Heo > Cc: Lai Jiangshan > Signed-off-by: Matthew Brost Applied 1-3 to wq/for-6.12. Thanks. -- tejun

Re: [PATCH v3 3/5] workqueue: Add interface for user-defined workqueue lockdep map

2024-08-13 Thread Tejun Heo
On Tue, Aug 13, 2024 at 06:55:20PM +, Matthew Brost wrote: > On Tue, Aug 13, 2024 at 08:52:26AM -1000, Tejun Heo wrote: > > On Fri, Aug 09, 2024 at 03:28:25PM -0700, Matthew Brost wrote: > > > Add an interface for a user-defined workqueue lockdep map, which is > >

Re: [PATCH v3 3/5] workqueue: Add interface for user-defined workqueue lockdep map

2024-08-13 Thread Tejun Heo
t; v2: > - Add alloc_workqueue_lockdep_map (Tejun) > v3: > - Drop __WQ_USER_OWNED_LOCKDEP (Tejun) > - static inline alloc_ordered_workqueue_lockdep_map (Tejun) > > Cc: Tejun Heo > Cc: Lai Jiangshan > Signed-off-by: Matthew Brost 1-3 look fine to me. Would applying them t

Re: [PATCH v2 3/5] workqueue: Add interface for user-defined workqueue lockdep map

2024-07-30 Thread Tejun Heo
Hello, On Tue, Jul 30, 2024 at 05:31:17PM -0700, Matthew Brost wrote: > +#define alloc_ordered_workqueue_lockdep_map(fmt, flags, lockdep_map, > args...)\ > + alloc_workqueue_lockdep_map(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), > 1, lockdep_map, ##args) > +#endif alloc_ordered_workq

Re: [RFC PATCH 1/3] workqueue: Add interface for user-defined workqueue lockdep map

2024-07-30 Thread Tejun Heo
On Tue, Jul 30, 2024 at 10:53:38PM +, Matthew Brost wrote: > I didn't want to change the export alloc_workqueue() arguments so I went > with this approach. Are you suggesting export a new function > alloc_workqueue_lockdep_map() which will share an internal > implementation with the existing al

Re: [RFC PATCH 1/3] workqueue: Add interface for user-defined workqueue lockdep map

2024-07-30 Thread Tejun Heo
Hello, Matthew. On Tue, Jul 30, 2024 at 03:17:40PM -0700, Matthew Brost wrote: > +/** > + * wq_init_user_lockdep_map - init user lockdep map for workqueue > + * @wq: workqueue to init lockdep map for > + * @lockdep_map: lockdep map to use for workqueue > + * > + * Initialize workqueue with a user

Re: Performance drop due to alloc_workqueue() misuse and recent change

2023-12-19 Thread Tejun Heo
Hello, again. On Mon, Dec 04, 2023 at 04:03:47PM +, Naohiro Aota wrote: ... > In summary, we misuse max_active, considering it is a global limit. And, > the recent commit introduced a huge performance drop in some cases. We > need to review alloc_workqueue() usage to check if its max_active s

Re: Performance drop due to alloc_workqueue() misuse and recent change

2023-12-04 Thread Tejun Heo
Hello, On Mon, Dec 04, 2023 at 04:03:47PM +, Naohiro Aota wrote: > Recently, commit 636b927eba5b ("workqueue: Make unbound workqueues to use > per-cpu pool_workqueues") changed WQ_UNBOUND workqueue's behavior. It > changed the meaning of alloc_workqueue()'s max_active from an upper limit > imp

Re: [RFC v6 0/8] DRM scheduling cgroup controller

2023-11-12 Thread Tejun Heo
Hello, >From cgroup POV, it generally looks fine to me. As before, I'm really curious whether this is something other non-intel drivers can get behind. Just one nit. On Tue, Oct 24, 2023 at 05:07:19PM +0100, Tvrtko Ursulin wrote: > * Allowing per DRM card configuration and queries is deliberatly

Re: [PATCH 16/17] cgroup/drm: Expose memory stats

2023-07-26 Thread Tejun Heo
Hello, On Wed, Jul 26, 2023 at 05:44:28PM +0100, Tvrtko Ursulin wrote: ... > > So, yeah, if you want to add memory controls, we better think through how > > the fd ownership migration should work. > > It would be quite easy to make the implicit migration fail - just the matter > of failing the fi

Re: [PATCH 16/17] cgroup/drm: Expose memory stats

2023-07-26 Thread Tejun Heo
Hello, On Wed, Jul 26, 2023 at 12:14:24PM +0200, Maarten Lankhorst wrote: > > So, yeah, if you want to add memory controls, we better think through how > > the fd ownership migration should work. > > I've taken a look at the series, since I have been working on cgroup memory > eviction. > > The s

[PATCH wq/for-6.5-fixes] workqueue: Drop the special locking rule for worker->flags and worker_pool->flags

2023-07-25 Thread Tejun Heo
>From aa6fde93f3a49e42c0fe0490d7f3711bac0d162e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 17 Jul 2023 12:50:02 -1000 Subject: [PATCH] workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000 wq_cpu_intensive_thresh_us is used to detect CPU-hogging per-cpu work it

Re: [PATCH 15/17] cgroup/drm: Expose GPU utilisation

2023-07-25 Thread Tejun Heo
Hello, On Tue, Jul 25, 2023 at 03:08:40PM +0100, Tvrtko Ursulin wrote: > > Also, shouldn't this be keyed by the drm device? > > It could have that too, or it could come later. Fun with GPUs that it not > only could be keyed by the device, but also by the type of the GPU engine. > (Which are a) ven

Re: [PATCH 16/17] cgroup/drm: Expose memory stats

2023-07-21 Thread Tejun Heo
On Wed, Jul 12, 2023 at 12:46:04PM +0100, Tvrtko Ursulin wrote: > $ cat drm.memory.stat > card0 region=system total=12898304 shared=0 active=0 resident=12111872 > purgeable=167936 > card0 region=stolen-system total=0 shared=0 active=0 resident=0 purgeable=0 > > Data is generated on demand f

Re: [PATCH 15/17] cgroup/drm: Expose GPU utilisation

2023-07-21 Thread Tejun Heo
On Fri, Jul 21, 2023 at 12:19:32PM -1000, Tejun Heo wrote: > On Wed, Jul 12, 2023 at 12:46:03PM +0100, Tvrtko Ursulin wrote: > > + drm.active_us > > + GPU time used by the group recursively including all child groups. > > Maybe instead add drm.stat and have "usage_us

Re: [PATCH 15/17] cgroup/drm: Expose GPU utilisation

2023-07-21 Thread Tejun Heo
On Wed, Jul 12, 2023 at 12:46:03PM +0100, Tvrtko Ursulin wrote: > + drm.active_us > + GPU time used by the group recursively including all child groups. Maybe instead add drm.stat and have "usage_usec" inside? That'd be more consistent with cpu side. Thanks. -- tejun

Re: [PATCH 12/17] cgroup/drm: Introduce weight based drm cgroup control

2023-07-21 Thread Tejun Heo
On Wed, Jul 12, 2023 at 12:46:00PM +0100, Tvrtko Ursulin wrote: > +DRM scheduling soft limits > +~~ Please don't say soft limits for this. It means something different for memcg, so it gets really confusing. Call it "weight based CPU time control" and maybe call the trigger

Re: [PATCH 08/17] drm/cgroup: Track DRM clients per cgroup

2023-07-21 Thread Tejun Heo
Hello, On Wed, Jul 12, 2023 at 12:45:56PM +0100, Tvrtko Ursulin wrote: > +void drmcgroup_client_migrate(struct drm_file *file_priv) > +{ > + struct drm_cgroup_state *src, *dst; > + struct cgroup_subsys_state *old; > + > + mutex_lock(&drmcg_mutex); > + > + old = file_priv->__css; >

Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)

2023-07-18 Thread Tejun Heo
wer boundary to 4000 MIPS. The scaling is still capped at 1s. >From 8555cbd4b22e5f85eb2bdcb84fd1d1f519a0a0d3 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 17 Jul 2023 12:50:02 -1000 Subject: [PATCH] workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000 wq_cpu_intensiv

Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)

2023-07-17 Thread Tejun Heo
Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 17 Jul 2023 12:50:02 -1000 Subject: [PATCH] workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 1000 wq_cpu_intensive_thresh_us is used to detect CPU-hogging per-cpu work items. Once detected, they're excluded from con

Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)

2023-07-13 Thread Tejun Heo
On Wed, Jul 12, 2023 at 02:27:45PM +0200, Peter Zijlstra wrote: > On Wed, Jul 12, 2023 at 11:04:16AM +0200, Geert Uytterhoeven wrote: > > Hoi Peter, > > > > On Wed, Jul 12, 2023 at 10:05 AM Peter Zijlstra > > wrote: > > > On Tue, Jul 11, 2023 at 11:39:17

Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)

2023-07-11 Thread Tejun Heo
On Tue, Jul 11, 2023 at 11:39:17AM -1000, Tejun Heo wrote: > On Tue, Jul 11, 2023 at 04:06:22PM +0200, Geert Uytterhoeven wrote: > > On Tue, Jul 11, 2023 at 3:55 PM Geert Uytterhoeven > > wrote: ... > > workqueue: neigh_managed_work hogged CPU for >1us 4 times, &

Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)

2023-07-11 Thread Tejun Heo
Hello, On Tue, Jul 11, 2023 at 04:06:22PM +0200, Geert Uytterhoeven wrote: > On Tue, Jul 11, 2023 at 3:55 PM Geert Uytterhoeven > wrote: > > > > Hi Tejun, > > > > On Fri, May 12, 2023 at 9:54 PM Tejun Heo wrote: > > > Workqueue now automatically marks

Re: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.

2023-05-10 Thread Tejun Heo
Hello, On Wed, May 10, 2023 at 04:59:01PM +0200, Maarten Lankhorst wrote: > The misc controller is not granular enough. A single computer may have any > number of > graphics cards, some of them with multiple regions of vram inside a single > card. Extending the misc controller to support dynami

Re: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.

2023-05-05 Thread Tejun Heo
Hello, On Wed, May 03, 2023 at 10:34:56AM +0200, Maarten Lankhorst wrote: > RFC as I'm looking for comments. > > For long running compute, it can be beneficial to partition the GPU memory > between cgroups, so each cgroup can use its maximum amount of memory without > interfering with other sched

Re: [RFC v4 00/10] DRM scheduling cgroup controller

2023-03-24 Thread Tejun Heo
Hello, Tvrtko. On Tue, Mar 14, 2023 at 02:18:54PM +, Tvrtko Ursulin wrote: > DRM scheduling soft limits > ~~ > > Because of the heterogenous hardware and driver DRM capabilities, soft limits > are implemented as a loose co-operative (bi-directional) interface between t

Re: [RFC 10/12] cgroup/drm: Introduce weight based drm cgroup control

2023-02-02 Thread Tejun Heo
Hello, On Thu, Feb 02, 2023 at 02:26:06PM +, Tvrtko Ursulin wrote: > When you say active/inactive - to what you are referring in the cgroup > world? Offline/online? For those my understanding was offline was a > temporary state while css is getting destroyed. Oh, it's just based on activity.

Re: [RFC 10/12] cgroup/drm: Introduce weight based drm cgroup control

2023-01-27 Thread Tejun Heo
On Thu, Jan 12, 2023 at 04:56:07PM +, Tvrtko Ursulin wrote: ... > + /* > + * 1st pass - reset working values and update hierarchical weights and > + * GPU utilisation. > + */ > + if (!__start_scanning(root, period_us)) > + goto out_retry; /* > +

Re: [RFC v3 00/12] DRM scheduling cgroup controller

2023-01-26 Thread Tejun Heo
Hello, On Thu, Jan 26, 2023 at 02:00:50PM +0100, Michal Koutný wrote: > On Wed, Jan 25, 2023 at 06:11:35PM +, Tvrtko Ursulin > wrote: > > I don't immediately see how you envisage the half-userspace implementation > > would look like in terms of what functionality/new APIs would be provided b

Re: Selecting CPUs for queuing work on

2022-08-12 Thread Tejun Heo
Hello, On Fri, Aug 12, 2022 at 04:54:04PM -0400, Felix Kuehling wrote: > In principle, I think IRQ routing to CPUs can change dynamically with > irqbalance. I wonder whether this is something which should be exposed to userland rather than trying to do dynamically in the kernel and let irqbalance

Re: Selecting CPUs for queuing work on

2022-08-12 Thread Tejun Heo
On Fri, Aug 12, 2022 at 04:26:47PM -0400, Felix Kuehling wrote: > Hi workqueue maintainers, > > In the KFD (amdgpu) driver we found a need to schedule bottom half interrupt > handlers on CPU cores different from the one where the top-half interrupt > handler runs to avoid the interrupt handler sta

Re: [PATCH v7 0/6] Proposal for a GPU cgroup controller

2022-05-20 Thread Tejun Heo
Hello, On Tue, May 17, 2022 at 04:30:29PM -0700, T.J. Mercier wrote: > Thanks for your suggestion. This almost works. "dmabuf" as a key could > work, but I'd actually like to account for each heap. Since heaps can > be dynamically added, I can't accommodate every potential heap name by > hardcodin

Re: [PATCH v7 0/6] Proposal for a GPU cgroup controller

2022-05-13 Thread Tejun Heo
Hello, On Thu, May 12, 2022 at 08:43:52PM -0700, T.J. Mercier wrote: > > I'm actually happy I've asked this question, wasn't silly after all. I > > think the > > problem here is a naming issue. What you really are monitor is "video > > memory", > > which consist of a memory segment allocated to

Re: [REPORT] syscall reboot + umh + firmware fallback

2022-05-12 Thread Tejun Heo
Hello, On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote: > > 1. wait_for_completion_killable_timeout() doesn't need someone to wake it up > >to make forward progress because it will unstick itself after timeout > >expires. > > I have a question about this one. Yes, it would

Re: [REPORT] syscall reboot + umh + firmware fallback

2022-05-12 Thread Tejun Heo
Hello, Just took a look out of curiosity. On Thu, May 12, 2022 at 02:25:57PM +0900, Byungchul Park wrote: > PROCESS A PROCESS B WORKER C > > __do_sys_reboot() > __do_sys_reboot() > mutex_lock(&system_transition_mutex) > ... mutex_lock(&system_transition_mutex)

Re: [RFC v5 1/6] gpu: rfc: Proposal for a GPU cgroup controller

2022-04-21 Thread Tejun Heo
Hello, On Wed, Apr 20, 2022 at 11:52:19PM +, T.J. Mercier wrote: > From: Hridya Valsaraju > > This patch adds a proposal for a new GPU cgroup controller for > accounting/limiting GPU and GPU-related memory allocations. > The proposed controller is based on the DRM cgroup controller[1] and >

Re: [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory

2022-04-04 Thread Tejun Heo
Hello, On Wed, Mar 30, 2022 at 01:56:09PM -0700, T.J. Mercier wrote: > The use case we have for accounting the total (separate from the > individual devices) is to include the value as part of bugreports, for > understanding the system-wide amount of dmabuf allocations. I'm not > aware of an exist

Re: [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory

2022-03-29 Thread Tejun Heo
Hello, On Mon, Mar 28, 2022 at 03:59:41AM +, T.J. Mercier wrote: > The API/UAPI can be extended to set per-device/total allocation limits > in the future. This total thing kinda bothers me. Can you please provide some concrete examples of how this and per-device limits would be used? Thanks.

Re: [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging

2022-03-29 Thread Tejun Heo
On Tue, Mar 29, 2022 at 10:42:20AM +0200, Daniel Vetter wrote: > Hm I just realized ... are the names in the groups abi? If yes then I > think we need to fix this before we merge anything. Yes. Thanks. -- tejun

Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues

2022-02-23 Thread Tejun Heo
On Wed, Feb 23, 2022 at 10:20:47PM +0100, Marek Szyprowski wrote: > Hi All, > > On 17.02.2022 12:22, Tetsuo Handa wrote: > > syzbot found a circular locking dependency which is caused by flushing > > system_long_wq WQ [1]. Tejun Heo commented that it makes no sen

Re: [RFC v2 0/6] Proposal for a GPU cgroup controller

2022-02-14 Thread Tejun Heo
Hello, On Fri, Feb 11, 2022 at 04:18:23PM +, T.J. Mercier wrote: > The GPU/DRM cgroup controller came into being when a consensus[1] > was reached that the resources it tracked were unsuitable to be integrated > into memcg. Originally, the proposed controller was specific to the DRM > subsyste

Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-07 Thread Tejun Heo
Hello, On Fri, May 07, 2021 at 06:30:56PM -0400, Alex Deucher wrote: > Maybe we are speaking past each other. I'm not following. We got > here because a device specific cgroup didn't make sense. With my > Linux user hat on, that makes sense. I don't want to write code to a > bunch of device sp

Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-07 Thread Tejun Heo
Hello, On Fri, May 07, 2021 at 03:55:39PM -0400, Alex Deucher wrote: > The problem is temporal partitioning on GPUs is much harder to enforce > unless you have a special case like SR-IOV. Spatial partitioning, on > AMD GPUs at least, is widely available and easily enforced. What is > the point o

Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-07 Thread Tejun Heo
Hello, On Fri, May 07, 2021 at 06:54:13PM +0200, Daniel Vetter wrote: > All I meant is that for the container/cgroups world starting out with > time-sharing feels like the best fit, least because your SRIOV designers > also seem to think that's the best first cut for cloud-y computing. > Whether i

Re: [PATCH 0/3] drm: commit_work scheduling

2020-09-21 Thread Tejun Heo
Hello, On Mon, Sep 21, 2020 at 11:21:54AM +0200, Daniel Vetter wrote: > The part I don't like about this is that it all feels rather hacked > together, and if we add more stuff (or there's some different thing in the > system that also needs rt scheduling) then it doesn't compose. > > So question

Re: [RFC v4 02/12] kthread: Add kthread_(un)block_work_queuing() and kthread_work_queuable()

2020-05-11 Thread Tejun Heo
On Fri, May 08, 2020 at 04:46:52PM -0400, Lyude Paul wrote: > Add some simple wrappers around incrementing/decrementing > kthread_work.cancelling under lock, along with checking whether queuing > is currently allowed on a given kthread_work, which we'll use want to > implement work cancelling with

Re: [RFC v4 01/12] kthread: Add kthread_queue_flush_work()

2020-05-11 Thread Tejun Heo
Hello, On Fri, May 08, 2020 at 04:46:51PM -0400, Lyude Paul wrote: > +bool kthread_queue_flush_work(struct kthread_work *work, > + struct kthread_flush_work *fwork); > +void __kthread_flush_work_fn(struct kthread_work *work); As an exposed interface, this doesn't seem gr

Re: [Poke: Tejun] Re: [RFC v3 03/11] drm/vblank: Add vblank works

2020-04-22 Thread Tejun Heo
Hello, On Tue, Apr 21, 2020 at 02:34:59PM +0200, Daniel Vetter wrote: > > > Also, of course, let me know if yu're not happy with the > > > __kthread_queue_work() changes/kthread_worker usage in drm_vblank_work as > > > well > > > > Just glanced over it and I still wonder whether it needs to be t

Re: [Poke: Tejun] Re: [RFC v3 03/11] drm/vblank: Add vblank works

2020-04-17 Thread Tejun Heo
Hello, On Fri, Apr 17, 2020 at 04:16:28PM -0400, Lyude Paul wrote: > Hey Tejun! So I ended up rewriting the drm_vblank_work stuff so that it used > kthread_worker. Things seem to work alright now. But while we're doing just > fine with vblank workers on nouveau, we're still having trouble meeting

Re: [PATCH 1/9] drm/vblank: Add vblank works

2020-04-14 Thread Tejun Heo
Hello, On Tue, Apr 14, 2020 at 12:52:51PM -0400, Lyude Paul wrote: > Hi, thanks for the response! And yes-I think this would actually be perfect > for what we need, I guess one question I might as well ask since I've got you > here: would patches to expose an unlocked version of kthread_queue_work

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-13 Thread Tejun Heo
Hello, On Mon, Apr 13, 2020 at 05:40:32PM -0400, Kenny Ho wrote: > By lack of consense, do you mean Intel's assertion that a standard is > not a standard until Intel implements it? (That was in the context of > OpenCL language standard with the concept of SubDevice.) I thought > the discussion so

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-13 Thread Tejun Heo
Hello, On Mon, Apr 13, 2020 at 04:17:14PM -0400, Kenny Ho wrote: > Perhaps we can even narrow things down to just > gpu.weight/gpu.compute.weight as a start? In this aspect, is the key That sounds great to me. > objection to the current implementation of gpu.compute.weight the > work-conserving

Re: [PATCH 1/9] drm/vblank: Add vblank works

2020-04-13 Thread Tejun Heo
Hello, On Mon, Apr 13, 2020 at 04:18:57PM -0400, Lyude Paul wrote: > Hi Tejun! Sorry to bother you, but have you had a chance to look at any of > this yet? Would like to continue moving this forward Sorry, wasn't following this thread. Have you looked at kthread_worker? https://git.kernel.org/

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-13 Thread Tejun Heo
Hello, Kenny. On Tue, Mar 24, 2020 at 02:49:27PM -0400, Kenny Ho wrote: > Can you elaborate more on what are the missing pieces? Sorry about the long delay, but I think we've been going in circles for quite a while now. Let's try to make it really simple as the first step. How about something lik

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-03-24 Thread Tejun Heo
On Tue, Mar 17, 2020 at 12:03:20PM -0400, Kenny Ho wrote: > What's your thoughts on this latest series? My overall impression is that the feedbacks aren't being incorporated throughly / sufficiently. Thanks. -- tejun ___ dri-devel mailing list dri-dev

Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-14 Thread Tejun Heo
On Fri, Feb 14, 2020 at 03:28:40PM -0500, Kenny Ho wrote: > Can you elaborate, per your understanding, how the lgpu weight > attribute differ from the io.weight you suggested? Is it merely a Oh, it's the non-weight part which is problematic. > formatting/naming issue or is it the implementation

Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-14 Thread Tejun Heo
Hello, Kenny, Daniel. (cc'ing Johannes) On Fri, Feb 14, 2020 at 01:51:32PM -0500, Kenny Ho wrote: > On Fri, Feb 14, 2020 at 1:34 PM Daniel Vetter wrote: > > > > I think guidance from Tejun in previos discussions was pretty clear that > > he expects cgroups to be both a) standardized and c) suffi

Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem

2019-12-02 Thread Tejun Heo
On Fri, Nov 29, 2019 at 01:00:36AM -0500, Kenny Ho wrote: > On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný wrote: > > On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho wrote: > > > +struct cgroup_subsys drm_cgrp_subsys = { > > > + .css_alloc = drmcg_css_alloc, > > > + .css_free

Re: [RFC PATCH] cgroup: Document interface files and rationale for DRM controller

2019-11-07 Thread Tejun Heo
Hello, On Tue, Nov 05, 2019 at 04:08:22PM -0800, Brian Welty wrote: > I was more interested in hearing your thoughts on whether you like > the approach to have a set of controls that are consistent with > some subset of the existing CPU/MEM ones. Any feedback on this? > Didn't really mean to sug

Re: [RFC PATCH] cgroup: Document interface files and rationale for DRM controller

2019-11-04 Thread Tejun Heo
On Mon, Nov 04, 2019 at 05:08:47PM -0500, Brian Welty wrote: > + gpuset.units > + gpuset.units.effective > + gpuset.units.partition > + > + gpuset.mems > + gpuset.mems.effective > + gpuset.mems.partition > + > + sched.max > + sched.stats > + sched.weight > + sched.weight.nice > + > + mem

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-10 Thread Tejun Heo
Hello, Michal. On Tue, Sep 10, 2019 at 01:54:48PM +0200, Michal Hocko wrote: > > So, while it'd great to have shrinkers in the longer term, it's not a > > strict requirement to be accounted in memcg. It already accounts a > > lot of memory which isn't reclaimable (a lot of slabs and socket > > bu

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-06 Thread Tejun Heo
Hello, Daniel. On Fri, Sep 06, 2019 at 05:34:16PM +0200, Daniel Vetter wrote: > > Hmm... what'd be the fundamental difference from slab or socket memory > > which are handled through memcg? Is system memory used by GPUs have > > further global restrictions in addition to the amount of physical >

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-06 Thread Tejun Heo
Hello, Daniel. On Fri, Sep 06, 2019 at 05:36:02PM +0200, Daniel Vetter wrote: > Block devices are a great example I think. How do you handle the > partitions on that? For drm we also have a main minor interface, and cgroup IO controllers only distribute hardware IO capacity and are blind to parti

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-06 Thread Tejun Heo
Hello, On Wed, Sep 04, 2019 at 10:54:34AM +0200, Daniel Vetter wrote: > Anyway, I don't think reusing the drm_minor registration makes sense, > since we want to be on the drm_device, not on the minor. Which is a bit > awkward for cgroups, which wants to identify devices using major.minor > pairs.

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-06 Thread Tejun Heo
Hello, Daniel. On Tue, Sep 03, 2019 at 09:48:22PM +0200, Daniel Vetter wrote: > I think system memory separate from vram makes sense. For one, vram is > like 10x+ faster than system memory, so we definitely want to have > good control on that. But maybe we only want one vram bucket overall > for t

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-03 Thread Tejun Heo
Hello, Daniel. On Tue, Sep 03, 2019 at 09:55:50AM +0200, Daniel Vetter wrote: > > * While breaking up and applying control to different types of > > internal objects may seem attractive to folks who work day in and > > day out with the subsystem, they aren't all that useful to users and > >

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-08-30 Thread Tejun Heo
Hello, I just glanced through the interface and don't have enough context to give any kind of detailed review yet. I'll try to read up and understand more and would greatly appreciate if you can give me some pointers to read up on the resources being controlled and how the actual use cases would

Re: [BUG] lockdep splat with kernfs lockdep annotations and slab mutex from drm patch??

2019-06-14 Thread Tejun Heo
Hello, On Fri, Jun 14, 2019 at 04:08:33PM +0100, Chris Wilson wrote: > #ifdef CONFIG_MEMCG > if (slab_state >= FULL && err >= 0 && is_root_cache(s)) { > struct kmem_cache *c; > > mutex_lock(&slab_mutex); > > so it happens to hit the error + FULL case with

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Tejun Heo
Hello, I haven't gone through the patchset yet but some quick comments. On Wed, May 15, 2019 at 10:29:21PM -0400, Kenny Ho wrote: > Given this controller is specific to the drm kernel subsystem which > uses minor to identify drm device, I don't see a need to complicate > the interfaces more by ha

Re: [RFC PATCH 0/5] cgroup support for GPU devices

2019-05-09 Thread Tejun Heo
Hello, On Tue, May 07, 2019 at 12:50:50PM -0700, Welty, Brian wrote: > There might still be merit in having a 'device mem' cgroup controller. > The resource model at least is then no longer mixed up with host memory. > RDMA community seemed to have some interest in a common controller at > least f

Re: [RFC PATCH 0/5] cgroup support for GPU devices

2019-05-06 Thread Tejun Heo
Hello, On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote: > The patch series enables device drivers to use cgroups to control the > following resources within a GPU (or other accelerator device): > * control allocation of device memory (reuse of memcg) > and with future work, we could e

Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-11-20 Thread Tejun Heo
Hello, On Tue, Nov 20, 2018 at 10:21:14PM +, Ho, Kenny wrote: > By this reply, are you suggesting that vendor specific resources > will never be acceptable to be managed under cgroup? Let say a user I wouldn't say never but whatever which gets included as a cgroup controller should have clea

Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-11-20 Thread Tejun Heo
Hello, On Tue, Nov 20, 2018 at 01:58:11PM -0500, Kenny Ho wrote: > Since many parts of the DRM subsystem has vendor-specific > implementations, we introduce mechanisms for vendor to register their > specific resources and control files to the DRM cgroup subsystem. A > vendor will register itself

Re: [PATCH 02/12] blk: use for_each_if

2018-07-11 Thread Tejun Heo
On Wed, Jul 11, 2018 at 01:31:51PM -0600, Jens Axboe wrote: > I don't think there's a git easy way of sending it out outside of > just ensuring that everybody is CC'ed on everything. I don't mind > that at all. I don't subscribe to lkml, and the patches weren't > sent to linux-block. Hence all I se

Re: [PATCH 03/12] cgroup: use for_each_if

2018-07-11 Thread Tejun Heo
On Mon, Jul 09, 2018 at 10:36:41AM +0200, Daniel Vetter wrote: > Avoids the need to invert the condition instead of the open-coded > version. > > Signed-off-by: Daniel Vetter > Cc: Tejun Heo > Cc: Li Zefan > Cc: Johannes Weiner > Cc: cgro...@vger.kernel.org Acked-by:

Re: [PATCH 02/12] blk: use for_each_if

2018-07-11 Thread Tejun Heo
On Wed, Jul 11, 2018 at 09:40:58AM -0700, Tejun Heo wrote: > On Mon, Jul 09, 2018 at 10:36:40AM +0200, Daniel Vetter wrote: > > Makes the macros resilient against if {} else {} blocks right > > afterwards. > > > > Signed-off-by: Daniel Vetter > > Cc: Teju

Re: [PATCH 02/12] blk: use for_each_if

2018-07-11 Thread Tejun Heo
On Mon, Jul 09, 2018 at 10:36:40AM +0200, Daniel Vetter wrote: > Makes the macros resilient against if {} else {} blocks right > afterwards. > > Signed-off-by: Daniel Vetter > Cc: Tejun Heo > Cc: Jens Axboe > Cc: Shaohua Li > Cc: Kate Stewart > Cc: Greg Kroah-Har

[PATCH] drm: fix fallouts from slow-work -> wq conversion

2018-06-19 Thread Tejun Heo
>From 9a919c46dfa48a9c1f465174609b90253eb8ffc1 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 9 Aug 2010 12:01:27 +0200 Commit 991ea75c (drm: use workqueue instead of slow-work), which made drm to use wq instead of slow-work, didn't account for the return value difference

[PATCH wq#for-linus] drm: fix a fallout from slow-work -> wq conversion

2018-06-19 Thread Tejun Heo
never fails and only uses the return value to indicate whether the work was already pending or not. This misconversion triggered spurious error messages. Remove the now unnecessary return value check and error message. Signed-off-by: Tejun Heo Reported-by: Markus Trippelsdorf Cc: David Airlie Cc

Re: [PATCH v3 1/6] cgroup: Allow registration and lookup of cgroup private data

2018-03-13 Thread Tejun Heo
Hello, Matt. cc'ing Roman and Alexei. On Tue, Mar 06, 2018 at 03:46:55PM -0800, Matt Roper wrote: > There are cases where other parts of the kernel may wish to store data > associated with individual cgroups without building a full cgroup > controller. Let's add interfaces to allow them to regis

Re: [PATCH v3 3/6] cgroup: Introduce cgroup_permission()

2018-03-13 Thread Tejun Heo
On Tue, Mar 06, 2018 at 03:46:57PM -0800, Matt Roper wrote: > Non-controller kernel subsystems may base access restrictions for > cgroup-related syscalls/ioctls on a process' access to the cgroup. > Let's make it easy for other parts of the kernel to check these cgroup > permissions. I'm not sure

Re: [PATCH v3 2/6] cgroup: Introduce task_get_dfl_cgroup()

2018-03-13 Thread Tejun Heo
(cc'ing Roman) Hello, On Tue, Mar 06, 2018 at 03:46:56PM -0800, Matt Roper wrote: > +static inline struct cgroup * > +task_get_dfl_cgroup(struct task_struct *task) > +{ > + struct cgroup *cgrp; > + > + mutex_lock(&cgroup_mutex); > + cgrp = task_dfl_cgroup(task); > + cgroup_get(cgr

Re: [PATCH 1/5] workqueue: Allow retrieval of current task's work struct

2018-02-12 Thread Tejun Heo
ng in the context of the worker. > > Cc: Tejun Heo > Cc: Lai Jiangshan > Cc: Dave Airlie > Cc: Ben Skeggs > Cc: Alex Deucher > Signed-off-by: Lukas Wunner I wonder whether it's too generic a name but there are other functions named in a similar fashion and AFAICS curre

Re: [PATCH RFC v2 3/7] cgroup: Add interface to allow drivers to lookup process cgroup membership

2018-02-07 Thread Tejun Heo
Hello, On Thu, Feb 01, 2018 at 11:53:11AM -0800, Matt Roper wrote: > +/** > + * cgroup_for_driver_process - return the cgroup for a process > + * @pid: process to lookup cgroup for > + * > + * Returns the cgroup from the v2 hierarchy that a process belongs to. > + * This function is intended to be

Re: [PATCH RFC v2 1/7] cgroup: Allow drivers to store data associated with a cgroup

2018-02-07 Thread Tejun Heo
Hello, On Thu, Feb 01, 2018 at 11:53:09AM -0800, Matt Roper wrote: > * Drivers may be built as modules (and unloaded/reloaded) which is not >something cgroup controllers support today. As discussed in the other subthread, this shouldn't be a concern. > * Drivers may wish to provide their o

Re: [Intel-gfx] [IGT PATCH RFC] tools: Introduce intel_cgroup tool

2018-02-07 Thread Tejun Heo
Hello, Forgot to respond to one point. On Thu, Feb 01, 2018 at 03:14:38PM -0800, Matt Roper wrote: > * The drivers that want to make use of this functionality may be built >as modules rather than compiled directly into the kernel. This is >important because the cgroups subsystem removed

Re: [Intel-gfx] [IGT PATCH RFC] tools: Introduce intel_cgroup tool

2018-02-07 Thread Tejun Heo
Hello, Matt, Chris. On Thu, Feb 01, 2018 at 03:14:38PM -0800, Matt Roper wrote: > > Hmm. Could we not avoid drm_ioctl + well known param names and use a > > more generic tool to set cgroup attributes? Just feels wrong that a > > such a generic interface boils down to a driver specific ioctl. So,

Re: [PATCH RFC 6/9] drm: Add cgroup helper library

2018-01-22 Thread Tejun Heo
Hello, Matt. On Fri, Jan 19, 2018 at 05:51:38PM -0800, Matt Roper wrote: > Most DRM drivers will want to handle the CGROUP_SETPARAM ioctl by looking up a > driver-specific per-cgroup data structure (or allocating a new one) and > storing > the supplied parameter value into the data structure (pos

  1   2   3   >