Radovan =?utf-8?q?Božić?= <[email protected]>,
Radovan =?utf-8?q?Božić?= <[email protected]>,
Radovan =?utf-8?q?Božić?= <[email protected]>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/[email protected]>


AaronBallman wrote:

> > > > 2. improving optimization behavior. So it's not really the 
> > > > implementation undefining the behavior, it's the implementation 
> > > > admitting the behavior was already undefined (maybe too find of a 
> > > > distinction?).
> > > 
> > > 
> > > What I mean is that some libc implementations seem define their behaviour 
> > > when a nullptr is passed.
> > 
> > 
> > [citation needed] -- do you have evidence that they intentionally define 
> > the behavior?
> 
> Is this intentional enough: 
> https://github.com/bminor/glibc/blob/26e48102108284d2474f83f5afee56b994c86d54/stdio-common/vfprintf-internal.c#L1533
>  ?
> 
> The macro expands to
> 
> ```c
>   do
>     {
>       CHECK_FILE (s, -1);
>       if (s->_flags & _IO_NO_WRITES)
>       {
>         s->_flags |= 0x0020;
>         __set_errno (9);
>         return -1;
>       }
>       if (format == ((void *) 0))
>       {
>         __set_errno (22);
>         return -1;
>       }
>     }
>   while (0);
> ```

I believe it is intentionally not defined. It's relying on `CHECK_FILE` which 
is defined as:
```
#ifdef IO_DEBUG
# define CHECK_FILE(FILE, RET) do {                             \
    if ((FILE) == NULL                                          \
        || ((FILE)->_flags & _IO_MAGIC_MASK) != _IO_MAGIC)      \
      {                                                         \
        __set_errno (EINVAL);                                   \
        return RET;                                             \
      }                                                         \
  } while (0)
#else
# define CHECK_FILE(FILE, RET) do { } while (0)
#endif
```
so it depends on whether `IO_DEBUG` is defined as to whether you get any check 
for null: 
https://github.com/bminor/glibc/blob/26e48102108284d2474f83f5afee56b994c86d54/libio/libioP.h#L974

> > Again, this is matching GCC's behavior and I think that is reasonable given 
> > how bad our codegen is otherwise: https://godbolt.org/z/9zh7KTGvG
> 
> Does this really matter though?

To me? Not really. To someone who is wondering why our codegen is worse than 
GCC's? Probably. 

> Sure, the CodeGen doesn't look great, but I'd expect by far most format 
> strings are literals.

I'd like to agree, but there are common counterexamples too. e.g., localization 
often requires non-literal format strings. And this isn't just about the format 
strings, it's also about the `FILE *` being passed (neither can be null), and 
those are never literals.

> If they're generated by some function that could be annotated with 
> `[[gnu::returns_nonnull]]` as well to get the same optimization behaviour, 
> but is much safer, since a function can usually actually guarantee that it 
> returns a non-null pointer. I also want to note that this is a single null 
> check after a relatively heavy function, so I doubt this will have much of an 
> impact. libcs can add the attributes themselves if it's deemed important 
> enough. If you think this isn't a problem I don't want to block this, 
> especially since I don't have any stake in this. I mostly want to make sure 
> that this is thought through and we don't set this kind of precedent for C++ 
> functions.

I think this is already the precedent? If the standard defines something as 
being UB, we've always treated as something to diagnose but is valid to 
optimize on unless it's going to break a lot of existing real world code. 
That's why I think it's reasonable to add `__attribute__((nonnull))` to it for 
diagnostic *and* optimization purposes when inferring the library function 
semantics. That's just following the standard, after all. I think there are 
cases where it's reasonable to argue for `_Nonnull` semantics instead, but I 
think the default is to optimize on UB rather than to leave the optimizations 
on the table on the chance there's a hardened library somewhere which wants to 
define the behavior.

That said, it's certainly more conservative to use `_Nonnull` instead of 
`__attribute__((nonnull))`. But I'm struggling to imagine what valid code would 
break from being aggressive. It would delete null pointer checks that happen 
*after* the pointer has already been passed to the library function, but that 
code wasn't correct to begin with and I don't imagine it's common that people 
do:
```
fprintf(SomeFile, "something");
if (SomeFile) {
  // Good stuff
}
```
Do you have some usage pattern you're worried about? If we have evidence this 
will be disruptive, that would be compelling to know about.

https://github.com/llvm/llvm-project/pull/160988
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to