Paul E. McKenney wrote: > > > The other thing that jumped out at me is that signals are very different > > animals from a locking viewpoint depending on whether they are: > > > > 1. ignored, > > > > 2. caught by a single thread, > > > > 3. fatal to multiple threads/processes (though I don't know > > of anything that shares sighand_struct between separate > > processes), or > > > > 4. otherwise global to multiple threads/processes (such as > > SIGSTOP and SIGCONT). > > > > And there are probably other distinctions that I have not yet caught > > on to. > > > > One way to approach this would be to make your suggested lock_task_sighand() > > look at the signal and acquire the appropriate locks. If, having acquired > > a given set of locks, it found that the needed set had changed (e.g., due > > to racing exec() or sigaction()), then it drops the locks and retries. > > OK, for this sort of approach to work, lock_task_sighand() must take and > return some sort of mask indicating what locks are held. The mask returned > by lock_task_sighand() would then be passed to an unlock_task_sighand().
Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND, so we always need to lock one ->sighand. Could you please clarify? > > > + if (!ret && sig && (sp = p->sighand)) { > > > if (!get_task_struct_rcu(p)) { > > > return -ESRCH; > > > } > > > - spin_lock_irqsave(&p->sighand->siglock, flags); > > > + spin_lock_irqsave(&sp->siglock, flags); > > > + if (p->sighand != sp) { > > > + spin_unlock_irqrestore(&sp->siglock, flags); > > > + put_task_struct(p); > > > + goto retry; > > > + } > > > ret = __group_send_sig_info(sig, info, p); > > > - spin_unlock_irqrestore(&p->sighand->siglock, flags); > > > + spin_unlock_irqrestore(&sp->siglock, flags); > > > put_task_struct(p); > > > > Do we really need get_task_struct_rcu/put_task_struct here? > > > > The task_struct can't go away under us, it is rcu protected. > > When ->sighand is locked, and it is still the same after > > the re-check, it means that 'p' has not done __exit_signal() > > yet, so it is safe to send the signal. > > > > And if the task has ->usage == 0, it means that it also has > > ->sighand == NULL, and your code will notice that. > > > > No? > > Seems plausible. I got paranoid after seeing the lock dropped in > handle_stop_signal(), though. Yes, this is bad and should be fixed, I agree. But why do you think we need to bump task->usage? It can't make any difference, afaics. The task_struct can't dissapear, the caller was converted to use rcu_read_lock() or it holds tasklist_lock. Nonzero task_struct->usage can't stop do_exit or sys_wait4, it will only postpone call_rcu(__put_task_struct_cb). And after we locked ->sighand we have sufficient memory barrier, so if we read the stale value into 'sp' we will notice that (if you were worried about this issue). Am I missed something? > void exit_sighand(struct task_struct *tsk) > { > write_lock_irq(&tasklist_lock); > - __exit_sighand(tsk); > + spin_lock(&tsk->sighand->siglock); > + if (tsk->sighand != NULL) { > + __exit_sighand(tsk); > + } > + spin_unlock(&tsk->sighand->siglock); > write_unlock_irq(&tasklist_lock); > } Very strange code. Why this check? And what happens with spin_unlock(&tsk->sighand->siglock); when tsk->sighand == NULL ? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/