Hi, Samuel -
On Sun, Jan 06, 2008 at 03:13:46PM +0000, Samuel Thibault wrote:
> Hello,
>
> I've dug a bit, since I've got an administration website which allows me
> to reproduce the bug quite reliably.
>
> Benjamin A. Okopnik, le Mon 03 Jul 2006 11:26:34 -0400, a écrit :
> > ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo
> > ...}) = 0
> > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
> > pipe([5, 7]) = 0
> > pipe([9, 10]) = 0
> > clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > child_tidptr=0xb7bd0928) = 4518
> > --- SIGCHLD (Child exited) @ 0 (0) ---
> > waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 127}], WNOHANG) = 4518
> > waitpid(-1, 0xbfa3e120, WNOHANG) = -1 ECHILD (No child processes)
> > rt_sigaction(SIGCHLD, {0x804ba60, [], SA_RESTART}, {0x804ba60, [],
> > SA_RESTART}, 8) = 0
> > sigreturn() = ? (mask now [])
> > close(7) = 0
> > fcntl64(5, F_GETFL) = 0 (flags O_RDONLY)
> > fstat64(5, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
> > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> > 0xb7f26000
> > _llseek(5, 0, 0xbfa3e33c, SEEK_CUR) = -1 ESPIPE (Illegal seek)
> > close(9) = 0
> > fcntl64(10, F_GETFL) = 0x1 (flags O_WRONLY)
> > fstat64(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
> > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> > 0xb7f25000
> > _llseek(10, 0, 0xbfa3e33c, SEEK_CUR) = -1 ESPIPE (Illegal seek)
> > write(10, "foo\n", 4) = -1 EPIPE (Broken pipe)
> > --- SIGPIPE (Broken pipe) @ 0 (0) ---
> > close(5) = 0
> > munmap(0xb7f26000, 4096) = 0
> > write(10, "foo\n", 4) = -1 EPIPE (Broken pipe)
> > close(10) = 0
> > munmap(0xb7f25000, 4096) = 0
> > kill(4518, SIGKILL) = -1 ESRCH (No such process)
> > rt_sigaction(SIGPIPE, {0x804c3c0, [], SA_RESTART}, {0x804c3c0, [],
> > SA_RESTART}, 8) = 0
> > sigreturn() = ? (mask now [])
> > --- SIGPIPE (Broken pipe) @ 0 (0) ---
> > rt_sigaction(SIGPIPE, {0x804c3c0, [], SA_RESTART}, {0x804c3c0, [],
> > SA_RESTART}, 8) = 0
> > sigreturn() = ? (mask now [])
> > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > +++ killed by SIGSEGV +++
>
> Note that at this point the segfault happens in malloc called by putenv
> (which itself is called by the / command).
>
> I've run this through gdb with handle SIGPIPE nopass, and then I
> wouldn't get the segfault. Digging a bit in the SIGPIPE handler showed
> me that it calls init_migemo(), which itself calls fclose(), which
> is not safe since that function is not in the list of signal-safe
> functions. I commented these fclose() calls, and now I can't reproduce
> the bug any more. I'll keep that "fixed" version of w3m for some more
> long-term testing, but I really think the problem is here: I guess that
> fclose() frees something, so that it may corrupt the heap, thus the
> segfault on the next malloc (which happens to be due to searching the
> page). So the solution is probably to have the signal handler just set
> a variable and move the call to init_migemo into the main stream of
> instruction.
I'd wonder what's going to be left open as a result of those two
"fclose()" calls not happening. Is there a signal-safe way of releasing
those handles? I'd hate to see you create more problems by fixing this
one. :)
Regards,
--
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *