On 3/25/21 8:02 AM, Jens Axboe wrote:
> On 3/25/21 7:56 AM, Stefan Metzmacher wrote:
>> Am 25.03.21 um 14:38 schrieb Jens Axboe:
>>> On 3/25/21 6:11 AM, Stefan Metzmacher wrote:
>>>>
>>>> Am 25.03.21 um 13:04 schrieb Eric W. Biederman:
>>>>> Stefan Metzmacher <me...@samba.org> writes:
>>>>>
>>>>>> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>>>>>>> From: "Eric W. Biederman" <ebied...@xmission.com>
>>>>>>>
>>>>>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>>>>>>
>>>>>>> Just like we don't allow normal signals to IO threads, don't deliver a
>>>>>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>>>>>>> signals in general, and have no means of flushing out a stop either.
>>>>>>>
>>>>>>> Longer term, we may want to look into allowing stop of these threads,
>>>>>>> as it relates to eg process freezing. For now, this prevents a spin
>>>>>>> issue if a SIGSTOP is delivered to the parent task.
>>>>>>>
>>>>>>> Reported-by: Stefan Metzmacher <me...@samba.org>
>>>>>>> Signed-off-by: Jens Axboe <ax...@kernel.dk>
>>>>>>> Signed-off-by: "Eric W. Biederman" <ebied...@xmission.com>
>>>>>>> Signed-off-by: Sasha Levin <sas...@kernel.org>
>>>>>>> ---
>>>>>>>  kernel/signal.c | 3 ++-
>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>>>>> index 55526b941011..00a3840f6037 100644
>>>>>>> --- a/kernel/signal.c
>>>>>>> +++ b/kernel/signal.c
>>>>>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct 
>>>>>>> *task, unsigned long mask)
>>>>>>>                         JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>>>>>>         BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & 
>>>>>>> JOBCTL_PENDING_MASK));
>>>>>>>  
>>>>>>> -       if (unlikely(fatal_signal_pending(task) || (task->flags & 
>>>>>>> PF_EXITING)))
>>>>>>> +       if (unlikely(fatal_signal_pending(task) ||
>>>>>>> +                    (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>>>>>>                 return false;
>>>>>>>  
>>>>>>>         if (mask & JOBCTL_STOP_SIGMASK)
>>>>>>>
>>>>>>
>>>>>> Again, why is this proposed for 5.11 and 5.10 already?
>>>>>
>>>>> Has the bit about the io worker kthreads been backported?
>>>>> If so this isn't horrible.  If not this is nonsense.
>>>
>>> No not yet - my plan is to do that, but not until we're 100% satisfied
>>> with it.
>>
>> Do you understand why the patches where autoselected for 5.11 and 5.10?
> 
> As far as I know, selections like these (AUTOSEL) are done by some bot
> that uses whatever criteria to see if they should be applied for earlier
> revisions. I'm sure Sasha can expand on that :-)
> 
> Hence it's reasonable to expect that sometimes it'll pick patches that
> should not go into stable, at least not just yet. It's important to
> understand that this message is just a notice that it's queued up for
> stable -rc, not that it's _in_ stable just yet. There's time to object.
> 
>>>> I don't know, I hope not...
>>>>
>>>> But I just tested v5.12-rc4 and attaching to
>>>> an application with iothreads with gdb is still not possible,
>>>> it still loops forever trying to attach to the iothreads.
>>>
>>> I do see the looping, gdb apparently doesn't give up when it gets
>>> -EPERM trying to attach to the threads. Which isn't really a kernel
>>> thing, but:
>>
>> Maybe we need to remove the iothreads from /proc/pid/tasks/
> 
> Is that how it finds them? It's arguably a bug in gdb that it just
> keeps retrying, but it would be nice if we can ensure that it just
> ignores them. Because if gdb triggers something like that, probably
> others too...
> 
>>>> And I tested 'kill -9 $pidofiothread', and it feezed the whole
>>>> machine...
>>>
>>> that sounds very strange, I haven't seen anything like that running
>>> the exact same scenario.
>>>
>>>> So there's still work to do in order to get 5.12 stable.
>>>>
>>>> I'm short on time currently, but I hope to send more details soon.
>>>
>>> Thanks! I'll play with it this morning and see if I can provoke
>>> something odd related to STOP/attach.
>>
>> Thanks!
>>
>> Somehow I have the impression that your same_thread_group_account patch
>> may fix a lot of things...
> 
> Maybe? I'll look closer.

It needs a bit more love than that. If you have threads already in your
app, then we just want to skip over the PF_IO_WORKER threads. We can't
just terminate the loop.

Something like the below works for me.


diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3851bfcdba56..abff2fe10bfa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3723,7 +3723,7 @@ static struct task_struct *first_tid(struct pid *pid, int 
tid, loff_t f_pos,
         */
        pos = task = task->group_leader;
        do {
-               if (!nr--)
+               if (same_thread_group(task, pos) && !nr--)
                        goto found;
        } while_each_thread(task, pos);
 fail:
@@ -3744,16 +3744,22 @@ static struct task_struct *first_tid(struct pid *pid, 
int tid, loff_t f_pos,
  */
 static struct task_struct *next_tid(struct task_struct *start)
 {
-       struct task_struct *pos = NULL;
+       struct task_struct *tmp, *pos = NULL;
+
        rcu_read_lock();
-       if (pid_alive(start)) {
-               pos = next_thread(start);
-               if (thread_group_leader(pos))
-                       pos = NULL;
-               else
-                       get_task_struct(pos);
+       if (!pid_alive(start))
+               goto no_thread;
+       list_for_each_entry_rcu(tmp, &start->thread_group, thread_group) {
+               if (!thread_group_leader(tmp) && same_thread_group(start, tmp)) 
{
+                       get_task_struct(tmp);
+                       pos = tmp;
+                       break;
+               }
        }
+no_thread:
        rcu_read_unlock();
+       if (!pos)
+               return NULL;
        put_task_struct(start);
        return pos;
 }
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3f6a0fcaa10c..4f621e386abf 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -668,11 +668,18 @@ static inline bool thread_group_leader(struct task_struct 
*p)
 }
 
 static inline
-bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
+bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2)
 {
        return p1->signal == p2->signal;
 }
 
+static inline
+bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
+{
+       return same_thread_group_account(p1, p2) &&
+                       !((p1->flags | p2->flags) & PF_IO_WORKER);
+}
+
 static inline struct task_struct *next_thread(const struct task_struct *p)
 {
        return list_entry_rcu(p->thread_group.next,
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f611658eeab..625110cacc2a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -307,7 +307,7 @@ void thread_group_cputime(struct task_struct *tsk, struct 
task_cputime *times)
         * those pending times and rely only on values updated on tick or
         * other scheduler action.
         */
-       if (same_thread_group(current, tsk))
+       if (same_thread_group_account(current, tsk))
                (void) task_sched_runtime(current);
 
        rcu_read_lock();

-- 
Jens Axboe

Reply via email to