Re: next snapshot in preparation for m4 1.4.12

Eric Blake Mon, 01 Sep 2008 07:00:02 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Tom G. Christensen on 8/30/2008 7:23 AM:


>> At any rate, I'm also interested in seeing how c-stack behaves if you
>> don't link with libsigsegv (in the earlier snapshot, your build showed
>> that c-stack still attempted stack overflow detection and crashed, causing
>> ./stackovf.test to FAil; my hope is that I've fixed things so that in this
>> snapshot so that c-stack returns ENOTSUPP, letting ./stackovf.test Skip).
>>
> Yes, it seems to do the right thing now.
>
> c_stack_action: Invalid argument
> SKIP: test-c-stack.sh
> c_stack_action: Invalid argument
> SKIP: test-c-stack2.sh

Good.  We've now proven that stack overflow detection is skipped on
platforms where we don't have it ported, rather than causing failures.
It's now only a problem of whether we can easily port things to Irix; and
it's less work if we can just focus on the libsigsegv side of things.

>> Can you include the config.log snippet for that portion of the configure
>> run, so we can see if it was a compile or run failure? 
>>
> configure:6499: checking for working C stack overflow detection
> configure:6591: cc -o conftest -g -I/usr/tgcware/include -L/usr/tgcware/lib 
> -Wl,-rpath,/usr/tgcware/lib conftest.c  >&5
> cfe: Warning 728: conftest.c, line 62: Long double not supported; double 
> assumed.
>            long double ld;
>         ---^

How annoying.  The compiler supports 'long double', per C89 (since long
double and double are allowed to be the same type), but warns you any time
you use it.  At any rate, that warning is ignorable (it appears in lots of
places in your logs).

> configure:6595: $? = 0
> configure:6601: ./conftest
> ./configure[6603]: 7578 Memory fault(coredump)
> configure:6605: $? = 139
> configure: program exited with status 139

Runtime failure.  Oh well - I'm not sure what libsigsegv is doing to avoid
this, but I'm glad that it's not a compile-time failure, and that without
libsigsegv, the stack overflow tests are skipped.

> I've been using dbx but I have a copy of gdb 4.17 aswell.
> Not sure how that would compare to gdb 6.x though.
> Running dbx with the corefile gives this backtrace:

I'm not familiar with dbx, but hopefully can offer enough advice.  And I'm
not sure how much gdb has improved since 4.17.

> [EMAIL PROTECTED] src]$ dbx conftest
> dbx version 3.19 Nov  3 1994 19:59:46
> Core from signal SIGSEGV: Segmentation violation

No mention of the handler, unlike in your other backtrace.  It looks like
the non-libsigsegv approach dumps core even before a handler gets a chance
to run.

> (dbx) t
>>  0 recurse(p = (nil)) ["/usr/people/tgc/buildpkg/m4/src/conftest.c":96, 
>> 0x400bec]
>    1 recurse(p = 0x7ff0038c = "\001") 
> ["/usr/people/tgc/buildpkg/m4/src/conftest.c":99, 0x400c10]
>    2 recurse(p = 0x7ff005a4 = "\001") 
> ["/usr/people/tgc/buildpkg/m4/src/conftest.c":99, 0x400c10]

Hmm, based on the pattern (each frame occupies 536 bytes), I would have
expected the debugger to report that p = 0x7ff00174 rather than (nil) in
the final frame before the stack overflow; but that may be a debugger
anomaly.  I just noticed that the c-stack.m4 file doesn't check the return
status from sigaction.  I suspect it worked for you, but just to be sure,
could you retry this with this patch to the program in config.log:

@@ -85,8 +85,7 @@ AC_DEFUN([AC_SYS_XSI_STACK_OVERFLOW_HEURISTIC],
           setrlimit (RLIMIT_STACK, &rl);
           #endif

- -      c_stack_action ();
- -      return recurse ("\1");
+    return c_stack_action () && recurse ("\1");
         }

> I cut out 8->1015, they're exactly the same, just the value for p changes.

Expected - we are inducing stack overflow by rapidly stepping through the
stack; the only interesting things are the newest one or two frames at the
point where stack overflow occurred.

>> Regardless of those test results, you
>> should now be able to run the just-built debugging version of
>> 'tests/test-c-stack'.  Run without arguments to trigger stack overflow,
>> and with any arguments (contents don't matter, just that argc>1) to
>> trigger an unrelated segv; it would be nice to step through both of those
>> cases in the debugger and see why they are dying abruptly rather than
>> detecting the problem and printing a nice status.
>>
> I tried to do this but I after poking at it for awhile I realised I
> have no idea what I should be looking for, what I should be stepping
> through or even how to do it right :(
> 
> What I found if anything is that running test-c-stack with no arguments
> results in the same backtrace as above.
> Running it with arguments results in:
> Executable 
> /usr/people/tgc/buildpkg/m4/src/m4-1.4.11.42-864d/tests/test-c-stack
> (dbx) run blah
> Process 18820 (test-c-stack) started
> Process 18820 (test-c-stack) stopped on signal SIGSEGV: Segmentation 
> violation (handler sigsegv_handler) [main:73 +0x18,0x400e78]

Notice how this instance referenced a handler.  I imagine that if you
continued single-stepping, you would then step through the statements in
sigsegv_handler.  At any rate, gdb does the same thing - it tells where
SIGSEGV occurs, and then proceeding to single step goes through the
handler.  Here's my sample session, on cygwin; I found that adding a
breakpoint on overflow_handler (the callback that got registered with
stackoverflow_install_handler) was important to see what I wanted:

$ ./test-c-stack
overflow_handler emergency=0 segv_handler_missing=0
./test-c-stack: stack overflow
$ ./test-c-stack 1
segv_handler serious=1
./test-c-stack: program error
Segmentation fault
$ gdb ./test-c-stack
GNU gdb 6.8.0.20080328-cvs (cygwin-special)
...
(gdb) b overflow_handler
Breakpoint 1 at 0x40128a: file c-stack.c, line 184.
(gdb) r
Starting program: /home/eblake/m4-branch/tests/test-c-stack.exe
[New thread 4368.0x17a4]
[New thread 4368.0x1520]

Program received signal SIGSEGV, Segmentation fault.
recurse (p=0x4303c "\001") at test-c-stack.c:48
48        array[0] = 1;
(gdb) c
Continuing.

Breakpoint 1, overflow_handler (emergency=0, context=0x407d74) at
c-stack.c:184
184         sprintf (buf, "overflow_handler emergency=%d
segv_handler_missing=%d\n",
(gdb) n
186         write (STDERR_FILENO, buf, strlen (buf));
(gdb) n
overflow_handler emergency=0 segv_handler_missing=0
190       die ((!emergency || segv_handler_missing) ? 0 : SIGSEGV);
(gdb) s
die (signo=0) at c-stack.c:108
108       segv_action (signo);
(gdb) n
109       message = signo ? program_error_message : stack_overflow_message;
(gdb) n
110       write (STDERR_FILENO, program_name, strlen (program_name));
(gdb) n
/home/eblake/m4-branch/tests/test-c-stack111      write (STDERR_FILENO, ": ", 
2);
(gdb) n
: 112     write (STDERR_FILENO, message, strlen (message));
(gdb) n
stack overflow113         write (STDERR_FILENO, "\n", 1);
(gdb) n

114       if (! signo)
(gdb) n
115         _exit (exit_failure);
(gdb) n

Program exited with code 01.


What I'm suspecting is that since m4's use of libsigsegv passed, but
c-stack's did not, that something in c-stack's overflow_handler is
triggering a secondary segv.  Installing a breakpoint on overflow_handler
will show whether we actually get there on stack overflow (I hope so,
since that is all libsigsegv, which is working on your platform), and if
so, where it is dying from the secondary segv.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             [EMAIL PROTECTED]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAki79VYACgkQ84KuGfSFAYBOdACgu4/F+ge7SHn1SL0WMEFKJx2d
gdwAoKN2lnOVpMyhSJ7iFXtLHndEZFxo
=Fmdn
-----END PGP SIGNATURE-----

Re: next snapshot in preparation for m4 1.4.12

Reply via email to