This is a cool bug; thanks for the report.
I love a good concurrency bug.

>  Wed Mar 12 13:20:01: hdr 0a110c09 00000020 f018c7ce cafebabe 20737023 
> 7064755b
>  Wed Mar 12 13:20:01: tail 0a110c09 00000020 f018c7ce cafebabe 20737023 
> 7064755b | 72657320 20726576

This says "#ps [udp server]" (I'm guessing about the last few bytes),
which is the name of the dns proc as it shows up in ps.

The stack trace showed this buffer being corrupt in devattach,
which executes:

        buf = smalloc(4+strlen(spec)+1);
        sprint(buf, "#%C%s", tc, spec);
        c->path = newpath(buf);
        free(buf);

There's already something wrong here -- if spec contains bad
UTF then buf will overflow all the time.  Really it should be
snprint or sprint(buf, "#%C") followed by strcat.  Let's ignore that,
since there's no bad UTF in your string.

If you poke around at how this gets called on a file like #p/1/blah,
you'll find that at this point in the code, spec == up->genbuf+2.
So if something else was overwriting up->genbuf at just the
wrong time (between the strlen and the execution of %s)
with the process name as it shows up in ps, then you'd see
the corruption in the stack trace.

Since up->genbuf is private to up (i.e., the current process)
I wondered at first if somehow the scheduler was broken and
was trying to run the same Proc on multiple CPUs, but
Erik assured me the scheduler had not been touched.

But aha!  If you poke around in the source code you will find
that there is one place where one proc uses another's genbuf:

% g genbuf | grep -v 'up->genbuf'
chan.c:1630:    /* place final element in genbuf for e.g. exec */
devproc.c:717:          j = procargs(p, p->genbuf, sizeof p->genbuf);
devproc.c:723:          memmove(a, &p->genbuf[offset], n);
portdat.h:701:  char    genbuf[128];    /* buffer used e.g. for last name
element from namec */
%

And look, it's the code for reading #p/1/args, which is
how ps figures out what to print:

/sys/src/9/port/devproc.c:/^procread
        case Qargs:
                qlock(&p->debug);
                j = procargs(p, p->genbuf, sizeof p->genbuf);
                qunlock(&p->debug);
                if(offset >= j)
                        return 0;
                if(offset+n > j)
                        n = j-offset;
                memmove(a, &p->genbuf[offset], n);
                return n;

So if you ran ps at exactly the right time (while one of the
procs on the machine was opening a #-ed file name and
was between strlen and sprint), you'd win the race.
You might improve your chances if the machine were
close to out of memory and the smalloc had to wait for
more memory.

Oops.  Changing those lines to use up->genbuf instead
of p->genbuf should eliminate the race.

Russ

Reply via email to