Re: [uml-user] Re: UML immediate segfault on RHEL4 regardless CONFIG_HOST_2G_2G

Blaisorblade Wed, 14 Dec 2005 04:18:03 -0800

On Tuesday 13 December 2005 19:14, Blaisorblade wrote:
> On Tuesday 13 December 2005 08:18, Karamazov Brothers wrote:
> > Hi all,


> > A bit more debugging information.  Gdb shows that ./linux segfaults
> > at the same location in os_early_checks () at
> > arch/um/os-Linux/start_up.c:264
> > no matter CONFIG_HOST_2G_2G is defined or not.  Any clue how to fix?

> I know this hang well, but until now I saw it just when I enable either SMP
> or spinlock debugging enabled.

More exactly, this exact hang seems specific to spinlock debugging.

> FOR Jeff Dike:

> It seems due (for what I saw) to a process faulting in a page from the
> stack area (it's still the initial stack, so mapped VM_GROWSDOWN) with an
> address < %esp.

> And as you know, the kernel says that you are buggy, and it's indeed right
> since a concurrent signal handler would kill you.

Indeed, as I just discovered, we are stupidly using printk on the normal 
process stack - that's dumb as current_thread_info() is invalid, beyond the 
sighandler problem.

> This started showing up with the Ingo Molnar's locking restructure and
> abstraction, for me. He reimplemented spinlock debugging, and then spinlock
> debugging started triggering this. I'm not sure this is the correct
> diagnosis, but when I debugged it (or similar crashes), this sounded
> reasonable.

> However, in this case, the hang is going on on logbuf_lock, so it puzzles
> me a lot.

I reconfirm all that I said. Indeed, I got exactly the same problem here, and 
diagnosed it.

As a fix, I suggest switching away from printk() for such early uses. Do you 
agree? I wonder if that's enough, but hey, the rule is "no spinlock without 
kernel stack". And then it seems correct the printk() avoidance.

I also considered a "decrease $esp - do a read fault in of the low page - 
reincrease $esp", but I now consider it dumb as the kernel stacks are not 
VM_GROWSDOWN, so no problem exists.

> And since it's the initial stack, the host, not the UML kernel (which isn't
> even running yet) is managing its growth, correct?

Exactly - i386 do_page_fault():

        [we are with a VMA stopping above the faulting address]
        if (!(vma->vm_flags & VM_GROWSDOWN))
                goto bad_area;
        if (error_code & 4) {
                /*
                 * accessing the stack below %esp is always a bug.
                 * The "+ 32" is there due to some instructions (like
                 * pusha) doing post-decrement on the stack and that
                 * doesn't show up until later..
                 */
                if (address + 32 < regs->esp)
                        goto bad_area;
        }
        if (expand_stack(vma, address))
                goto bad_area;

I haven't tried commenting that and testing the bug not getting reproduced, 
however.

The failing line is this one:

debug_spin_lock_before():

        SPIN_BUG_ON(lock->owner == current, lock, "recursion");

as you know, current implies dereferencing current_thread_info() - it's 
equivalent to *(struct task_struct*)current_thread_info(). On /proc/$pid/maps 
I can see the stack going down until 0xffffa000, and with GDB (6.4, btw, 
working nice) thread trying to access 0xffff8000 (and $esp is much higher).

In fact, under the debugger the segfault is seen at random, and this may be 
related to slightly different semantics under debug.

For instance, I quickly saw that get_user_page(), the (host) implementation of 
PTRACE_PEEK/POKE TEXT/DATA, handles this differently - it calls 
find_extend_vma() which has no %ESP test.

So examining the content of current_thread_info() would be a sure way to make 
the process survive.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

                
___________________________________ 
Yahoo! Messenger: chiamate gratuite in tutto il mondo 
http://it.messenger.yahoo.com



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
User-mode-linux-user mailing list
User-mode-linux-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user

Re: [uml-user] Re: UML immediate segfault on RHEL4 regardless CONFIG_HOST_2G_2G

Reply via email to