On Fri, May 15, 2009 at 10:06:13AM +0200, Peter Holm wrote: > On Fri, May 15, 2009 at 09:02:39AM +0200, Ed Schouten wrote: > > Hi Kostik, > > > > * Konstantin Belousov <k...@freebsd.org> wrote: > > > Log: > > > Do not advance req->oldidx when sysctl_old_user returning an > > > error due to copyout failure or short buffer. > > > > > > The later breaks the usermode iterators of the sysctl results that pack > > > arbitrary number of variable-sized structures. Iterator expects that > > > kernel filled exactly oldlen bytes, and tries to interpret half-filled > > > or garbage structure at the end of the buffer. In particular, > > > kinfo_getfile(3) segfaulted. > > > > > > Reported and tested by: pho > > > MFC after: 3 weeks > > > > Is it possible that this change introduces a regression? Right now > > `pstat -t' gets stuck in an infinite loop. I've added the following > > printf: > > > > | Index: pstat.c > > | =================================================================== > > | --- pstat.c (revision 192128) > > | +++ pstat.c (working copy) > > | @@ -263,6 +263,7 @@ > > | if (errno != ENOMEM) > > | err(1, "sysctlbyname()"); > > | len *= 2; > > | + printf("Going to %zu\n", len); > > | if ((xttys = realloc(xttys, len)) == NULL) > > | err(1, "realloc()"); > > | } > > > > pstat on -CURRENT prints: > > > > | LINE INQ CAN LIN LOW OUTQ USE LOW COL SESS PGID STATE > > | Going to 0 > > | Going to 0 > > | Going to 0 > > | ... > > > > If I use the same patch on RELENG_6, I get the expected result: > > > > | LINE RAW CAN OUT IHIWT ILOWT OHWT LWT COL STATE SESS PGID > > DISC > > | Going to 272 > > | Going to 544 > > | Going to 1088 > > | Going to 2176 > > | Going to 4352 > > | Going to 8704 > > | sysmouse 0 0 0 0 0 0 0 0 - 0 0 > > term > > | ... > > > > So the problem is that sysctl overwrites the len argument with 0, even > > if it returns back to userspace with ENOMEM. > > > > I see we have two changes in sysctl. In theory it could also be related > > to jhb@'s changes to sysctl locking, but I suspect it's less likely. > > > > I can confirm that it is r192094 that triggers the loop.
Yes, this is what I mean when talked about a breakage. Below is the reversal of r192094 + the change to keep the old, ugly behaviour of sysctl kern.proc.filedesc to return 0 on ENOMEM, but with oldlen chopped at the end of the last completely written struct kern_info instead of the middle of partially-written one. Peter, could you, please, retest ?
pgp7MqiDnJOJ9.pgp
Description: PGP signature