Re: init's children list is long and slows reaping children.

Eric W. Biederman Sat, 07 Apr 2007 17:41:16 -0700

Oleg Nesterov <[EMAIL PROTECTED]> writes:

> On 04/06, Oleg Nesterov wrote:
>> 
>> Perhaps,
>> 
>> --- t/kernel/exit.c~ 2007-04-06 23:31:31.000000000 +0400
>> +++ t/kernel/exit.c  2007-04-06 23:31:57.000000000 +0400
>> @@ -275,10 +275,7 @@ static void reparent_to_init(void)
>>      remove_parent(current);
>>      current->parent = child_reaper(current);
>>      current->real_parent = child_reaper(current);
>> -    add_parent(current);
>> -
>> -    /* Set the exit signal to SIGCHLD so we signal init on exit */
>> -    current->exit_signal = SIGCHLD;
>> +    current->exit_signal = -1;
>>  
>>      if (!has_rt_policy(current) && (task_nice(current) < 0))
>>              set_user_nice(current, 0);
>> 
>> is enough. init is still our parent (make ps happy), but it can't see us,
>> we are not on ->children list.
>
> OK, this doesn't work. A multi-threaded init may do execve().


Good catch.  daemonize must die! 

> So, we can re-parent a kernel thread to swapper. In that case it doesn't 
> matter
> if we put task on ->children list or not.

Yes.  We can.

> User-visible change. Acceptable?

We would have user visible changes when we changed to ktrhead_create anyway,
and it's none of user space's business so certainly.

>> Off course, we also need to add preparent_to_init() to kthread() and
>> (say) stopmachine(). Or we can create kernel_thread_detached() and
>> modify callers to use it.
>
> It would be very nice to introduce CLONE_KERNEL_THREAD instead, then

If we are going to do something to copy_process and the like let's take
this one step farther.  Let's pass in the value of the task to copy.

Then we can add a wrapper around copy_process to build kernel_thread
something like:

struct task_struct *__kernel_thread(int (*fn)(void *), void * arg,
                                    unsigned long flags)
{
        struct task_struct *task;
        struct pt_regs regs, *reg;

        reg = kernel_thread_regs(&regs, fn, arg);
        task = copy_process(&init_task, flags, 0, reg, 0, NULL, NULL, NULL, 0);
        if (!IS_ERR(task))
                wake_up_new_task(task, flags);

        return task;
}

long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
        struct task_struct *task;
        task = __kernel_thread(fn, arg, flags);
        if (IS_ERR(task))
                return PTR_ERR(task);
        return task->pid;               
}

After that daemonize just becomes:

void daemonize(const char *name, ...)
{
        va_list args;

        va_start(args, name);
        vsnprintf(current->comm, sizeof(current->comm), name, args);
        va_end(args);
}

And kthread_create becomes:

struct task_struct *kthread_create(int (*threadfn)(void *data),
                                   void *data,
                                   const char namefmt[],
                                   ...)
{
        struct kthread_create_info create;
        struct task_struct *task;

        create.threadfn = threadfn;
        create.data = data;

        /* We want our own signal handler (we take no signals by default). */
        task = __kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | 
SIGCHLD);
        if (!IS_ERR(task)) {
                va_list args;
                va_start(args, namefmt);
                vsnprintf(task->comm, sizeof(task->comm),  namefmt, args);
                va_end(args);
        }
        return task;
}

If we are willing to go that far.  I think it is worth it to touch the
architecture specific code.  As that removes an unnecessary wait
queue, and ensures that kernel threads always start with a consistent
state.  So it is a performance, scalability and simplicity boost.

Otherwise we should just put the code in the callers of kernel_thread
like we do today.

I don't have the energy to rework all of th architecture or even a
noticeable fraction of them right now.  I have to much on my plate.
A non-architecture specific solution looks like a fairly simple
patch and I might get around to it one of these times..

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

Reply via email to