[9fans] Re: Plan9 [v]seprint(2) API design bug

Alejandro Colomar via 9fans Sun, 09 Feb 2025 15:31:14 -0800

[CC += 9fans@]

Hi Jacob,


On Sun, Feb 09, 2025 at 03:51:13PM -0600, Jacob Moody wrote:
> On 2/9/25 14:59, Alejandro Colomar wrote:
> > Hi all,
> > 
> > I found a few addresses inspecting both the 9front and plan9port git
> > repositories.  If you think anyone else --or any mailing list-- should
> > be CCed, feel free to add them.  I also added Rob and Ken, which I
> > believe to be the authors of this API.
> > 
> > Some time ago, I discussed a design bug in Plan9's [v]seprint(2) with
> > Doug.  ISTR that we agreed on the bug, although I couldn't find where to
> > report it.  I hope this time I found the right people.
> 
> In general 9front specific comments should go to the 9front list 
> 9fr...@9front.org
> more general questions regarding any sort of plan 9 may be best sent to 
> 9fans@9fans.net
> There is enough overlap that it should be seen by the right people.

Thanks!  Are those lists public?  That is, can I post to those lists
without being subscribed?

> > The issue is both with the API, and its documentation.
> > 
> > So, I was writing a function for myself, for being able to chain
> > snprintf(3) calls, and ended up re-inventing Plan9's seprint(2) --I
> > didn't know it existed back then--.  Having invented it from scratch, I
> > was able to compare my design with seprint(2), and see some obvious
> > flaws in it.
> > 
> > The documentation for [v]seprint(2) isn't clear about what happens on
> > truncation.  Does the function return NULL?  Reading the implementation
> > seems to say no.  Does the function return a special value that allows
> > detecting truncation?  Reading the implementation seems to say no.
> > Hummm.  But then, it returns as it the input string had been short
> > enough to not truncate, thus silently truncating?  The documentation
> > seems to say so, and the implementation seems to confirm it.  This is
> > unfortunate.
> > 
> > There are two ways you could detect truncation:
> > 
> > -  Return the 'e' value on truncation.
> > -  Return a null pointer on truncation.
> 
> You can check, it's just a bit fuzzy. You can check if the return value
> points to the last valid byte of the input. Now this will not let you
> differentiate between the buffer being filled exactly up to the brim
> and if the input was truncated. However you could assume the worst case
> and be safe all the time in those scenarios. It's not great but it gets
> the job done.

Hmmm, yeah, so you can workaround it by wasting 1 byte.  Not a great
thing.  Not a great thing.  :|

Iff we can't fix seprint(2) for historic reasons, I think it would be
better to invent a new seprint2(2) and consider slowly migrating all
code to that API.  Existing code using seprint(2) might have many bugs,
and needs to be audited anyway.

> > The latter seems to be more appropriate.  My implementation has tried
> > both, and I'm finally settling on the null pointer.  That's the most
> > easy-to-use API, and is also safer (if one accidentally uses that null
> > pointer, it will easily abort the program, instead of overrunning
> > something).
> > 
> > I've called my function seprintf().  Here's how it's used:
> > 
> >     #define endof(a)  (a + countof(a))
> > 
> >     p = buf;
> >     p = seprintf(p, endof(buf), "...");
> >     p = seprintf(p, endof(buf), "...");
> >     p = seprintf(p, endof(buf), "...");
> >     p = seprintf(p, endof(buf), "...");
> >     p = seprintf(p, endof(buf), "...");
> >     p = seprintf(p, endof(buf), "...");
> >     if (p == NULL) {
> >             if (errno == E2BIG)
> >                     goto truncated;
> >             goto failed;
> >     }
> > 
> > If you search through 9front's source code, you'll find some weird calls
> > to seprint(2):
> > 
> > sys/src/libc/9sys/idn.c:214:                if((dp = seprint(dp, de, 
> > "%.*S", nr, rb)) == nil)
> > sys/src/libc/9sys/idn.c-215-                        return -1;
> > sys/src/libc/9sys/idn.c-216-                if(dp >= de)
> > sys/src/libc/9sys/idn.c-217-                        return -1;
> > 
> > It looks like it expects dp to possibly be a null pointer, which
> > shouldn't be possible except for programmer bugs.  Let's check the
> > implementation:
> > 
> >     $ grepc -tfd seprint .
> >     ./src/lib9/fmt/seprint.c:char*
> >     seprint(char *buf, char *e, char *fmt, ...)
> >     {
> >             char *p;
> >             va_list args;
> > 
> >             va_start(args, fmt);
> >             p = vseprint(buf, e, fmt, args);
> >             va_end(args);
> >             return p;
> >     }
> >     $ grepc -tfd vseprint .
> >     ./src/lib9/fmt/vseprint.c:char*
> >     vseprint(char *buf, char *e, char *fmt, va_list args)
> >     {
> >             Fmt f;
> > 
> >             if(e <= buf)
> >                     return nil;
> >             f.runes = 0;
> >             f.start = buf;
> >             f.to = buf;
> >             f.stop = e - 1;
> >             f.flush = 0;
> >             f.farg = nil;
> >             f.nfmt = 0;
> >             VA_COPY(f.args,args);
> >             fmtlocaleinit(&f, nil, nil, nil);
> >             dofmt(&f, fmt);
> >             VA_END(f.args);
> >             *(char*)f.to = '\0';
> >             return (char*)f.to;
> >     }
> > 
> > The only way to return a null pointer is if buf>=e.  But how could that
> > happen?  Also, the test in <sys/src/libc/9sys/idn.c:216> seems really
> > impossible.  seprint(2) silently truncates, and returns a pointer to the
> > terminating NUL byte.  That can never be >=e, since 'e' is supposed to
> > be a pointer to one-after-the-last element, that is, one after the
> > terminating NUL byte.
> 
> We manually increment the returned value from seprint later in that function:
> /sys/src/libc/9sys/idn.c:220
> So buf can be >= e in this specific case, further investigation with other
> 9front folks reviewed the code and determined it is working as intended.
> Now it is not exactly clear,

But the test is performed before the increment.  How can it be possible?
The loop is:

200-    for(;;){
201-            nc = nr = 0;
202-            while(cp[nc] != 0){
203-                    n = chartorune(&r, cp+nc);
204-                    if(r == '.')
205-                            break;
206-                    if(nr >= nelem(rb))
207-                            return -1;
208-                    rb[nr++] = r;
209-                    nc += n;
210-            }
211-            if(cistrncmp(cp, "xn--", 4) == 0)
212-                    if((nr = punydecode(nc-4, cp+4, nelem(rb), rb)) < 0)
213-                            return -1;
214:            if((dp = seprint(dp, de, "%.*S", nr, rb)) == nil)
215-                    return -1;
216-            if(dp >= de)
217-                    return -1;
218-            if(cp[nc] == 0)
219-                    break;
220-            *dp++ = '.';
221-            cp += nc+1;
222-    }

In line 214, seprint(2) cannot possibly return anything >=de.  And
there's no increment between line 214 and line 216, so line 216 must be
dead code, isn't it?  The value of dp might be incremented later in line
220, but that's already after the test has happened, and thus cannot
make it non-dead code posthumously, can it?

> and relying on this behavior is not great
> so we are exploring perhaps changing this to make things more clear.
> http://okturing.com/src/23249/body is the current draft.

This looks better, but I'm not happy doing a transformation like that
claiming that nothing changes, where the current code clearly reads as
dead code.  I'd first make sure that the commit message really tells the
full story.

> > I suspect that we could fix seprint(2) by making it return a null
> > pointer on truncation, and it would most likely fix existing code, which
> > does expect a way to detect truncation.
> > 
> > Here's how I implemented my seprintf() for Linux.  It may need some
> > adaptation to Plan9, so I welcome your feedback about it.  But I think
> > the main ideas could be exported to Plan9.
> > 
> >     $ grepc -htfd seprintf .
> >     inline char *
> >     seprintf(char *dst, char *end, const char *restrict fmt, ...)
> >     {
> >             char     *p;
> >             va_list  ap;
> > 
> >             va_start(ap, fmt);
> >             p = vseprintf(dst, end, fmt, ap);
> >             va_end(ap);
> > 
> >             return p;
> >     }
> >     $ grepc -htfd vseprintf .
> >     inline char *
> >     vseprintf(char *dst, char *end, const char *restrict fmt, va_list ap)
> >     {
> >             int        len;
> >             ptrdiff_t  size;
> > 
> >             if (dst == NULL)
> >                     return NULL;
> > 
> >             size = end - dst;
> >             len = vstprintf(dst, size, fmt, ap);
> >             if (len == -1)
> >                     return NULL;
> > 
> >             return dst + len;
> >     }
> >     $ grepc -htfd vstprintf .
> >     inline int
> >     vstprintf(char *restrict s, int size, const char *restrict fmt, va_list 
> > ap)
> >     {
> >             int  len;
> > 
> >             len = vsnprintf(s, size, fmt, ap);
> >             if (len == -1)
> >                     return -1;
> >             if (len >= size && size != 0) {
> >                     errno = E2BIG;
> >                     return -1;
> >             }
> > 
> >             return len;
> >     }
> > 
> > What do you think?
> 
> In my personal opinion the issue is not so great that it would be worth 
> changing
> the existing code (and all the callers). More "core" functions like this is 
> used not
> only by internal code but by plenty of external code.

I suspect most of that external code is bogus, and probably depends on
some behavior that differs from the actual behavior of seprint(2), just
like the code we discussed above has dead code attempting to check for
truncation with invalid tests.

Here's one that's more obviously wrong, I think:

        sys/src/cmd/nusb/lib/dump.c:75:         s = seprint(s, e, " %.2ux", 
b[i]);
        sys/src/cmd/nusb/lib/dump.c-76- if(s == e)
        sys/src/cmd/nusb/lib/dump.c-77-         fprint(2, "%s: usb/lib: 
hexdump: bug: small buffer\n", argv0);
        sys/src/cmd/nusb/lib/dump.c-78- return dbuff;
        sys/src/cmd/nusb/lib/dump.c-79-}

s==e cannot be possibly true in line 76 right after the call to
seprint(2) in line 75.  That's necessarily dead code, or I'm getting
lost in the implementation of vseprint(2).

After checking many call sites, I think I agree with you that the API
cannot be fixed without breaking some code (code that is already broken
in other ways, but).  Maybe I'd add a new API and slowly move to it,
deprecating seprint(2).  Maybe call the new one seprint2(2)?

> These days I think we like to
> have a fairly good reason to grow the list of 9front idiosyncrasies compared 
> to the
> original plan 9 for core interfaces like this.

I'd say having this bad API being code is in itself a problem, and a
better API would probably fix some bugs, and promote better code.


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

signature.asc
Description: PGP signature

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T90be9e76a4dae77d-M014d4797afb3ec6fcdb450ff
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[9fans] Re: Plan9 [v]seprint(2) API design bug

Reply via email to