On Mon, 2020-11-09 at 09:00 +0100, Peter Zijlstra wrote: > On Mon, Nov 09, 2020 at 01:54:59PM +1100, NeilBrown wrote: > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 4418f5cb8324..728870965df1 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -1784,7 +1784,12 @@ static inline int > > test_tsk_need_resched(struct task_struct *tsk) > > #ifndef CONFIG_PREEMPTION > > extern int _cond_resched(void); > > #else > > -static inline int _cond_resched(void) { return 0; } > > +static inline int _cond_resched(void) > > +{ > > + if (current->flags & PF_WQ_WORKER) > > + workqueue_cond_resched(); > > + return 0; > > +} > > #endif > > > > #define cond_resched() ({ \ > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 9a2fbf98fd6f..5b2e38567a0c 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -5620,6 +5620,8 @@ SYSCALL_DEFINE0(sched_yield) > > #ifndef CONFIG_PREEMPTION > > int __sched _cond_resched(void) > > { > > + if (current->flags & PF_WQ_WORKER) > > + workqueue_cond_resched(); > > if (should_resched(0)) { > > preempt_schedule_common(); > > return 1; > > > Much hate for this.. :/ cond_resched() should be a NOP on !PREEMPT > and > you wreck that. Also, you call into that workqueue_cond_resched() > unconditionally, even when it wouldn't have rescheduled, which seems > very wrong too. > > On top of all that, you're adding an extra load to the funcion :/ > > At some poine Paul tried to frob cond_resched() for RCU and ran into > all > sorts of performance issues, I'm thinking this will too. > > > Going by your justification for all this: > > > I think that once a worker calls cond_resched(), it should be > > treated as > > though it was run from a WQ_CPU_INTENSIVE queue, because only cpu- > > intensive > > tasks need to call cond_resched(). This would allow other workers > > to be > > scheduled. > > I'm thinking the real problem is that you're abusing workqueues. Just > don't stuff so much work into it that this becomes a problem. Or > rather, > if you do, don't lie to it about it.
If we can't use workqueues to call iput_final() on an inode, then what is the point of having them at all? Neil's use case is simply a file that has managed to accumulate a seriously large page cache, and is therefore taking a long time to complete the call to truncate_inode_pages_final(). Are you saying we have to allocate a dedicated thread for every case where this happens? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.mykleb...@hammerspace.com