[9fans] Re: Plan9 [v]seprint(2) API design bug

Jacob Moody Sun, 09 Feb 2025 16:36:25 -0800

On 2/9/25 17:05, Alejandro Colomar wrote:
> [CC += 9fans@]
> 
> Hi Jacob,
> 
> On Sun, Feb 09, 2025 at 03:51:13PM -0600, Jacob Moody wrote:
>> On 2/9/25 14:59, Alejandro Colomar wrote:
>>> Hi all,
>>>
>>> I found a few addresses inspecting both the 9front and plan9port git
>>> repositories.  If you think anyone else --or any mailing list-- should
>>> be CCed, feel free to add them.  I also added Rob and Ken, which I
>>> believe to be the authors of this API.
>>>
>>> Some time ago, I discussed a design bug in Plan9's [v]seprint(2) with
>>> Doug.  ISTR that we agreed on the bug, although I couldn't find where to
>>> report it.  I hope this time I found the right people.
>>
>> In general 9front specific comments should go to the 9front list 
>> 9fr...@9front.org
>> more general questions regarding any sort of plan 9 may be best sent to 
>> 9fans@9fans.net
>> There is enough overlap that it should be seen by the right people.
> 
> Thanks!  Are those lists public?  That is, can I post to those lists
> without being subscribed?


9fans is, 9front is not.

>>> The issue is both with the API, and its documentation.
>>>
>>> So, I was writing a function for myself, for being able to chain
>>> snprintf(3) calls, and ended up re-inventing Plan9's seprint(2) --I
>>> didn't know it existed back then--.  Having invented it from scratch, I
>>> was able to compare my design with seprint(2), and see some obvious
>>> flaws in it.
>>>
>>> The documentation for [v]seprint(2) isn't clear about what happens on
>>> truncation.  Does the function return NULL?  Reading the implementation
>>> seems to say no.  Does the function return a special value that allows
>>> detecting truncation?  Reading the implementation seems to say no.
>>> Hummm.  But then, it returns as it the input string had been short
>>> enough to not truncate, thus silently truncating?  The documentation
>>> seems to say so, and the implementation seems to confirm it.  This is
>>> unfortunate.
>>>
>>> There are two ways you could detect truncation:
>>>
>>> -  Return the 'e' value on truncation.
>>> -  Return a null pointer on truncation.
>>
>> You can check, it's just a bit fuzzy. You can check if the return value
>> points to the last valid byte of the input. Now this will not let you
>> differentiate between the buffer being filled exactly up to the brim
>> and if the input was truncated. However you could assume the worst case
>> and be safe all the time in those scenarios. It's not great but it gets
>> the job done.
> 
> Hmmm, yeah, so you can workaround it by wasting 1 byte.  Not a great
> thing.  Not a great thing.  :|
> 
> Iff we can't fix seprint(2) for historic reasons, I think it would be
> better to invent a new seprint2(2) and consider slowly migrating all
> code to that API.  Existing code using seprint(2) might have many bugs,
> and needs to be audited anyway.

Sure let's audit existing code and fix things, I am very much interested
in making things better, but I don't think we should point at a couple of bugs
and decide we need to rip up the floor boards. More of the reasoning for why
I feel this way is later on in this mail.

>>> The latter seems to be more appropriate.  My implementation has tried
>>> both, and I'm finally settling on the null pointer.  That's the most
>>> easy-to-use API, and is also safer (if one accidentally uses that null
>>> pointer, it will easily abort the program, instead of overrunning
>>> something).
>>>
>>> I've called my function seprintf().  Here's how it's used:
>>>
>>>     #define endof(a)  (a + countof(a))
>>>
>>>     p = buf;
>>>     p = seprintf(p, endof(buf), "...");
>>>     p = seprintf(p, endof(buf), "...");
>>>     p = seprintf(p, endof(buf), "...");
>>>     p = seprintf(p, endof(buf), "...");
>>>     p = seprintf(p, endof(buf), "...");
>>>     p = seprintf(p, endof(buf), "...");
>>>     if (p == NULL) {
>>>             if (errno == E2BIG)
>>>                     goto truncated;
>>>             goto failed;
>>>     }
>>>
>>> If you search through 9front's source code, you'll find some weird calls
>>> to seprint(2):
>>>
>>> sys/src/libc/9sys/idn.c:214:                if((dp = seprint(dp, de, 
>>> "%.*S", nr, rb)) == nil)
>>> sys/src/libc/9sys/idn.c-215-                        return -1;
>>> sys/src/libc/9sys/idn.c-216-                if(dp >= de)
>>> sys/src/libc/9sys/idn.c-217-                        return -1;
>>>
>>> It looks like it expects dp to possibly be a null pointer, which
>>> shouldn't be possible except for programmer bugs.  Let's check the
>>> implementation:
>>>
>>>     $ grepc -tfd seprint .
>>>     ./src/lib9/fmt/seprint.c:char*
>>>     seprint(char *buf, char *e, char *fmt, ...)
>>>     {
>>>             char *p;
>>>             va_list args;
>>>
>>>             va_start(args, fmt);
>>>             p = vseprint(buf, e, fmt, args);
>>>             va_end(args);
>>>             return p;
>>>     }
>>>     $ grepc -tfd vseprint .
>>>     ./src/lib9/fmt/vseprint.c:char*
>>>     vseprint(char *buf, char *e, char *fmt, va_list args)
>>>     {
>>>             Fmt f;
>>>
>>>             if(e <= buf)
>>>                     return nil;
>>>             f.runes = 0;
>>>             f.start = buf;
>>>             f.to = buf;
>>>             f.stop = e - 1;
>>>             f.flush = 0;
>>>             f.farg = nil;
>>>             f.nfmt = 0;
>>>             VA_COPY(f.args,args);
>>>             fmtlocaleinit(&f, nil, nil, nil);
>>>             dofmt(&f, fmt);
>>>             VA_END(f.args);
>>>             *(char*)f.to = '\0';
>>>             return (char*)f.to;
>>>     }
>>>
>>> The only way to return a null pointer is if buf>=e.  But how could that
>>> happen?  Also, the test in <sys/src/libc/9sys/idn.c:216> seems really
>>> impossible.  seprint(2) silently truncates, and returns a pointer to the
>>> terminating NUL byte.  That can never be >=e, since 'e' is supposed to
>>> be a pointer to one-after-the-last element, that is, one after the
>>> terminating NUL byte.
>>
>> We manually increment the returned value from seprint later in that function:
>> /sys/src/libc/9sys/idn.c:220
>> So buf can be >= e in this specific case, further investigation with other
>> 9front folks reviewed the code and determined it is working as intended.
>> Now it is not exactly clear,
> 
> But the test is performed before the increment.  How can it be possible?
> The loop is:
> 
> 200-  for(;;){
> 201-          nc = nr = 0;
> 202-          while(cp[nc] != 0){
> 203-                  n = chartorune(&r, cp+nc);
> 204-                  if(r == '.')
> 205-                          break;
> 206-                  if(nr >= nelem(rb))
> 207-                          return -1;
> 208-                  rb[nr++] = r;
> 209-                  nc += n;
> 210-          }
> 211-          if(cistrncmp(cp, "xn--", 4) == 0)
> 212-                  if((nr = punydecode(nc-4, cp+4, nelem(rb), rb)) < 0)
> 213-                          return -1;
> 214:          if((dp = seprint(dp, de, "%.*S", nr, rb)) == nil)
> 215-                  return -1;
> 216-          if(dp >= de)
> 217-                  return -1;
> 218-          if(cp[nc] == 0)
> 219-                  break;
> 220-          *dp++ = '.';
> 221-          cp += nc+1;
> 222-  }
> 
> In line 214, seprint(2) cannot possibly return anything >=de.  And
> there's no increment between line 214 and line 216, so line 216 must be
> dead code, isn't it?  The value of dp might be incremented later in line
> 220, but that's already after the test has happened, and thus cannot
> make it non-dead code posthumously, can it?

Yes line 216 is dead code, which is why it was removed in the new code.
Your original comment was that the check on line 214 was impossible,
I was commenting that no that check on line 214 is intentional.
Sorry, I should have been more specific.

>> and relying on this behavior is not great
>> so we are exploring perhaps changing this to make things more clear.
>> http://okturing.com/src/23249/body is the current draft.
> 
> This looks better, but I'm not happy doing a transformation like that
> claiming that nothing changes, where the current code clearly reads as
> dead code.  I'd first make sure that the commit message really tells the
> full story.

We just pushed a change that we're happy with.
Thanks for the bug report.

>>> I suspect that we could fix seprint(2) by making it return a null
>>> pointer on truncation, and it would most likely fix existing code, which
>>> does expect a way to detect truncation.
>>>
>>> Here's how I implemented my seprintf() for Linux.  It may need some
>>> adaptation to Plan9, so I welcome your feedback about it.  But I think
>>> the main ideas could be exported to Plan9.
>>>
>>>     $ grepc -htfd seprintf .
>>>     inline char *
>>>     seprintf(char *dst, char *end, const char *restrict fmt, ...)
>>>     {
>>>             char     *p;
>>>             va_list  ap;
>>>
>>>             va_start(ap, fmt);
>>>             p = vseprintf(dst, end, fmt, ap);
>>>             va_end(ap);
>>>
>>>             return p;
>>>     }
>>>     $ grepc -htfd vseprintf .
>>>     inline char *
>>>     vseprintf(char *dst, char *end, const char *restrict fmt, va_list ap)
>>>     {
>>>             int        len;
>>>             ptrdiff_t  size;
>>>
>>>             if (dst == NULL)
>>>                     return NULL;
>>>
>>>             size = end - dst;
>>>             len = vstprintf(dst, size, fmt, ap);
>>>             if (len == -1)
>>>                     return NULL;
>>>
>>>             return dst + len;
>>>     }
>>>     $ grepc -htfd vstprintf .
>>>     inline int
>>>     vstprintf(char *restrict s, int size, const char *restrict fmt, va_list 
>>> ap)
>>>     {
>>>             int  len;
>>>
>>>             len = vsnprintf(s, size, fmt, ap);
>>>             if (len == -1)
>>>                     return -1;
>>>             if (len >= size && size != 0) {
>>>                     errno = E2BIG;
>>>                     return -1;
>>>             }
>>>
>>>             return len;
>>>     }
>>>
>>> What do you think?
>>
>> In my personal opinion the issue is not so great that it would be worth 
>> changing
>> the existing code (and all the callers). More "core" functions like this is 
>> used not
>> only by internal code but by plenty of external code.
> 
> I suspect most of that external code is bogus, and probably depends on
> some behavior that differs from the actual behavior of seprint(2), just
> like the code we discussed above has dead code attempting to check for
> truncation with invalid tests.
> 
> Here's one that's more obviously wrong, I think:
> 
>       sys/src/cmd/nusb/lib/dump.c:75:         s = seprint(s, e, " %.2ux", 
> b[i]);
>       sys/src/cmd/nusb/lib/dump.c-76- if(s == e)
>       sys/src/cmd/nusb/lib/dump.c-77-         fprint(2, "%s: usb/lib: 
> hexdump: bug: small buffer\n", argv0);
>       sys/src/cmd/nusb/lib/dump.c-78- return dbuff;
>       sys/src/cmd/nusb/lib/dump.c-79-}
> 
> s==e cannot be possibly true in line 76 right after the call to
> seprint(2) in line 75.  That's necessarily dead code, or I'm getting
> lost in the implementation of vseprint(2).

You're correct in your interpretation, this bug is also fixed now.
Thanks for the reports, if you find any more I'd be very happy to
have them sent our way.

> 
> After checking many call sites, I think I agree with you that the API
> cannot be fixed without breaking some code (code that is already broken
> in other ways, but).  Maybe I'd add a new API and slowly move to it,
> deprecating seprint(2).  Maybe call the new one seprint2(2)?

Maybe, we had talked about doing something like this after your first email.
There was a lack of consensus on if we'd like to do this at the moment.

>> These days I think we like to
>> have a fairly good reason to grow the list of 9front idiosyncrasies compared 
>> to the
>> original plan 9 for core interfaces like this.
> 
> I'd say having this bad API being code is in itself a problem, and a
> better API would probably fix some bugs, and promote better code.
> 

It's a little funky and perhaps a little sharp, but that is not something 
uncommon in C.
I can agree with you that simpler semantics would have been nice, but I think 
you're missing
some context of how the plan 9 community works at this point. We're really 
quite small and the
churn rate has historically been really low (9front has picked up some steam in 
comparison but
is still not anywhere near most other mainstream operating systems). We have 
people that use
quite old code all the time, or have their own existing code that might be 
going on a couple
decades old at this point. This is in part because "core" APIs don't tend to 
change or churn that
much. We have community members who are active that still do a lot of work on 
"original"
plan 9 with minimal patches. Things have drifted throughout the years, some 
drift more warranted
then others. We still like to share code around and I am bit hesitant to make 
things harder
to move back and forth.

Anyway, all this is to just give some context about some of our hesitation to 
make some changes
for an issue on paper that is annoying but is still possible to avoid.

I appreciate your review on the 9front code, always happy to have discussions 
like this :).


Thanks,
Jacob Moody

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T90be9e76a4dae77d-M051da777c7078e3f496c1aeb
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[9fans] Re: Plan9 [v]seprint(2) API design bug

Reply via email to