Re: Missed warning (-Wuse-after-free)

2023-02-24 Thread Martin Uecker via Gcc
Am Donnerstag, dem 23.02.2023 um 19:21 -0600 schrieb Serge E. Hallyn:
> On Fri, Feb 24, 2023 at 01:02:54AM +0100, Alex Colomar wrote:
> > Hi Martin,
> > 
> > On 2/23/23 20:57, Martin Uecker wrote:
> > > Am Donnerstag, dem 23.02.2023 um 20:23 +0100 schrieb Alex Colomar:
> > > > Hi Martin,
> > > > 
> > > > On 2/17/23 14:48, Martin Uecker wrote:
> > > > > > This new wording doesn't even allow one to use memcmp(3);
> > > > > > just reading the pointer value, however you do it, is UB.
> > > > > 
> > > > > memcmp would not use the pointer value but work
> > > > > on the representation bytes and is still allowed.
> > > > 
> > > > Hmm, interesting.  It's rather unspecified behavior. Still
> > > > unpredictable: (memcmp(&p, &p, sizeof(p) == 0) might evaluate to true or
> > > > false randomly; the compiler may compile out the call to memcmp(3),
> > > > since it knows it won't produce any observable behavior.
> > > > 
> > > > 
> > > 
> > > No, I think several things get mixed up here.
> > > 
> > > The representation of a pointer that becomes invalid
> > > does not change.
> > > 
> > > So (0 === memcmp(&p, &p, sizeof(p)) always
> > > evaluates to true.
> > > 
> > > Also in general, an unspecified value is simply unspecified
> > > but does not change anymore.
> 
> Right.  p is its own thing - n bytes on the stack containing some value.
> Once it comes into scope, it doesn't change on its own.  And if I do
> free(p) or o = realloc(p), then the value of p itself - the n bytes on
> the stack - does not change.

Yes, but one comment about terminology:. The C standard
differentiates between the representation, i.e. the bytes on
the stack, and the value.  The representation is converted to
a value during lvalue conversion.  For an invalid pointer
the representation is indeterminate because it now does not
point to a valid object anymore.  So it is not possible to
convert the representation to a value during lvalue conversion.
In other words, it does not make sense to speak of the value
of the pointer anymore.

> I realize C11 appears to have changed that.  I fear that in doing so it
> actually risks increasing the confusion about pointers.  IMO it's much
> easier to reason about
> 
>   o = realloc(p, X);
> 
> (and more baroque constructions) when keeping in mind that o, p, and the
> object pointed to by either one are all different things.
> 

What did change in C11? As far as I know, the pointer model
did not change in C11.

> > > Reading an uninitialized value of automatic storage whose
> > > address was not taken is undefined behavior, so everything
> > > is possible afterwards.
> > > 
> > > An uninitialized variable whose address was taken has a
> > > representation which can represent an unspecified value
> > > or a no-value (trap) representation. Reading the
> > > representation itself is always ok and gives consistent
> > > results. Reading the variable can be undefined behavior
> > > iff it is a trap representation, otherwise you get
> > > the unspecified value which is stored there.
> > > 
> > > At least this is my reading of the C standard. Compilers
> > > are not full conformant.
> > 
> > Does all this imply that the following is well defined behavior (and shall
> > print what one would expect)?
> > 
> >   free(p);
> > 
> >   (void) &p;  // take the address
> >   // or maybe we should (void) memcmp(&p, &p, sizeof(p)); ?
> > 
> >   printf("%p\n", p);  // we took previously its address,
> >   // so now it has to hold consistently
> >   // the previous value
> > 
> > 

No, the printf is not well defined, because the lvalue conversion
of the pointer with indeterminate representation may lead to
undefined behavior.


Martin


> > This feels weird.  And a bit of a Schroedinger's pointer.  I'm not entirely
> > convinced, but might be.
> 
> Again, p is just an n byte variable which happens to have (one hopes)
> pointed at a previously malloc'd address.
> 
> And I'd argue that pre-C11, this was not confusing, and would not have
> felt weird to you.
> 
> But I am most grateful to you for having brought this to my attention.
> I may not agree with it and not like it, but it's right there in the
> spec, so time for me to adjust :)
> 







Re: Missed warning (-Wuse-after-free)

2023-02-24 Thread Martin Uecker via Gcc
Am Freitag, dem 24.02.2023 um 02:42 +0100 schrieb Alex Colomar:
> Hi Serge, Martin,
> 
> On 2/24/23 02:21, Serge E. Hallyn wrote:
> > > Does all this imply that the following is well defined behavior (and shall
> > > print what one would expect)?
> > > 
> > >    free(p);
> > > 
> > >    (void) &p;  // take the address
> > >    // or maybe we should (void) memcmp(&p, &p, sizeof(p)); ?
> > > 
> > >    printf("%p\n", p);  // we took previously its address,
> > >    // so now it has to hold consistently
> > >    // the previous value
> > > 
> > > 
> > > This feels weird.  And a bit of a Schroedinger's pointer.  I'm not 
> > > entirely
> > > convinced, but might be.
> > 
> > Again, p is just an n byte variable which happens to have (one hopes)
> > pointed at a previously malloc'd address.
> > 
> > And I'd argue that pre-C11, this was not confusing, and would not have
> > felt weird to you.
> > 
> > But I am most grateful to you for having brought this to my attention.
> > I may not agree with it and not like it, but it's right there in the
> > spec, so time for me to adjust :)
> 
> I'll try to show why this feels weird to me (even in C89):
> 
> 
> alx@dell7760:~/tmp$ cat pointers.c
> #include 
> #include 
> 
> 
> int
> main(void)
> {
>   char  *p, *q;
> 
>   p = malloc(42);
>   if (p == NULL)
>   exit(1);
> 
>   q = realloc(p, 42);
>   if (q == NULL)
>   exit(1);
> 
>   (void) &p;  // If we remove this, we get -Wuse-after-free
> 
>   printf("(%p == %p) = %i\n", p, q, (p == q));
> }
> alx@dell7760:~/tmp$ cc -Wall -Wextra pointers.c  -Wuse-after-free=3
> alx@dell7760:~/tmp$ ./a.out
> (0x5642cd9022a0 == 0x5642cd9022a0) = 1
> 

No, you can't do the comparison or use the value of 'p'
because 'p' is not a valid pointer. (The address taken
makes no difference here, but it may confuse the
compiler so that it does not warn.)

> 
> This pointers point to different objects (actually, one of them doesn't 
> even point to an object anymore), so they can't compare equal, according 
> to both:
> 
> 
> 
> 
> 
> (I believe C89 already had the concept of lifetime well defined as it is 
> now, so the object had finished it's lifetime after realloc(3)).
> 
> How can we justify that true, if the pointer don't point to the same 
> object?  And how can we justify a hypothetical false (which compilers 
> don't implement), if compilers will really just read the value?  To 
> implement this as well defined behavior, it could result in no other 
> than false, and it would require heavy overhead for the compilers to 
> detect that the seemingly-equal values are indeed different, don't you 
> think?  The easiest solution is for the standard to just declare this 
> outlaw, IMO.

This is undefined behavior, so the comparison can return false
or true or crash or whatever.  

Martin

> 
> Maybe it could do an exception for printing, that is, reading a pointer 
> is not a problem in itself, a long as you don't compare it, but I'm not 
> such an expert about this.
> 
> Cheers,
> 
> Alex
> 
> > 
> > -serge
> 
> -- 
> 
> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
> 




Re: Missed warning (-Wuse-after-free)

2023-02-24 Thread Martin Uecker via Gcc
Am Freitag, dem 24.02.2023 um 03:01 + schrieb Peter Lafreniere:

...
> 
> > Maybe it could do an exception for printing, that is, reading a pointer
> > is not a problem in itself, a long as you don't compare it, but I'm not
> > such an expert about this.
> 
> One last thought: with the above strict interpretation of the c standard,
> it would become nigh on impossible to implement the malloc(3) family of
> functions in c themselves. I vote for the "shared storage" interpretation
> of the c11 standard that is actually implemented rather than this abstract
> liveness oriented interpretation.

This is a bit of a misunderstanding about what "undefined behavior" means
in ISO C. It simply means that ISO C does not specify the behavior.  This
does not mean it is illegal to do something which has undefined behavior.

Instead, it means you can not rely on the ISO C standard for portable
behavior.  So if you implement "malloc" in C itself you will probably
rely on "undefined behavior", but this is perfectly fine.  The C standard
specifies behavior of "malloc", but does not care how it is implemented.


Martin



[GSoC][C++: Compiler Built-in Traits]: Example Impls & Small Patches

2023-02-24 Thread Ken Matsui via Gcc
Hi,

My name is Ken Matsui. I am highly interested in contributing to the
project idea, "C++: Implement compiler built-in traits for the
standard library traits." To understand how to implement those traits,
could you please give me some example implementations of the compiler
built-in traits, as well as some recommended traits to get started
with making small patches?

Also, I would appreciate receiving the contact information for the
project mentor, Patrick Palka.

Sincerely,
Ken Matsui


Re: Missed warning (-Wuse-after-free)

2023-02-24 Thread Serge E. Hallyn
On Fri, Feb 24, 2023 at 09:36:45AM +0100, Martin Uecker wrote:
> Am Donnerstag, dem 23.02.2023 um 19:21 -0600 schrieb Serge E. Hallyn:
> > On Fri, Feb 24, 2023 at 01:02:54AM +0100, Alex Colomar wrote:
> > > Hi Martin,
> > > 
> > > On 2/23/23 20:57, Martin Uecker wrote:
> > > > Am Donnerstag, dem 23.02.2023 um 20:23 +0100 schrieb Alex Colomar:
> > > > > Hi Martin,
> > > > > 
> > > > > On 2/17/23 14:48, Martin Uecker wrote:
> > > > > > > This new wording doesn't even allow one to use memcmp(3);
> > > > > > > just reading the pointer value, however you do it, is UB.
> > > > > > 
> > > > > > memcmp would not use the pointer value but work
> > > > > > on the representation bytes and is still allowed.
> > > > > 
> > > > > Hmm, interesting.  It's rather unspecified behavior. Still
> > > > > unpredictable: (memcmp(&p, &p, sizeof(p) == 0) might evaluate to true 
> > > > > or
> > > > > false randomly; the compiler may compile out the call to memcmp(3),
> > > > > since it knows it won't produce any observable behavior.
> > > > > 
> > > > > 
> > > > 
> > > > No, I think several things get mixed up here.
> > > > 
> > > > The representation of a pointer that becomes invalid
> > > > does not change.
> > > > 
> > > > So (0 === memcmp(&p, &p, sizeof(p)) always
> > > > evaluates to true.
> > > > 
> > > > Also in general, an unspecified value is simply unspecified
> > > > but does not change anymore.
> > 
> > Right.  p is its own thing - n bytes on the stack containing some value.
> > Once it comes into scope, it doesn't change on its own.  And if I do
> > free(p) or o = realloc(p), then the value of p itself - the n bytes on
> > the stack - does not change.
> 
> Yes, but one comment about terminology:. The C standard
> differentiates between the representation, i.e. the bytes on
> the stack, and the value.  The representation is converted to
> a value during lvalue conversion.  For an invalid pointer
> the representation is indeterminate because it now does not
> point to a valid object anymore.  So it is not possible to
> convert the representation to a value during lvalue conversion.
> In other words, it does not make sense to speak of the value
> of the pointer anymore.

I'm sure there are, especially from an implementer's point of view,
great reasons for this.

However, as just a user, the "value" of 'void *p' should absolutely
not be tied to whatever is at that address.  I'm given a simple
linear memory space, under which sits an entirely different view
obfuscated by page tables, but that doesn't concern me.  if I say
void *p = -1, then if I print p, then I expect to see that value.

Since I'm complaining about standards I'm picking and choosing here,
but I'll still point at the printf(3) manpage :)  :

   p  The  void * pointer argument is printed in hexadecimal (as if by 
%#x
  or %#lx).

> > I realize C11 appears to have changed that.  I fear that in doing so it
> > actually risks increasing the confusion about pointers.  IMO it's much
> > easier to reason about
> > 
> > o = realloc(p, X);
> > 
> > (and more baroque constructions) when keeping in mind that o, p, and the
> > object pointed to by either one are all different things.
> > 
> 
> What did change in C11? As far as I know, the pointer model
> did not change in C11.

I haven't looked in more detail, and don't really plan to, but my
understanding is that the text of:

  The lifetime of an object is the portion of program execution during which 
storage is
  guaranteed to be reserved for it. An object exists, has a constant address, 
and retains
  its last-stored value throughout its lifetime. If an object is referred to 
outside of its
  lifetime, the behavior is undefined. The value of a pointer becomes 
indeterminate when
  the object it points to (or just past) reaches the end of its lifetime.

(especially the last sentence) was new.

Maybe the words "value of a pointer" don't mean what I think they
mean.  But that's the phrase to which I object.  The n bytes on
the stack, p, are not changed just because something happened with
the accounting for the memory at the address represented by that
value.  If they do, then that's not 'C' any more.

> > > > Reading an uninitialized value of automatic storage whose
> > > > address was not taken is undefined behavior, so everything
> > > > is possible afterwards.
> > > > 
> > > > An uninitialized variable whose address was taken has a
> > > > representation which can represent an unspecified value
> > > > or a no-value (trap) representation. Reading the
> > > > representation itself is always ok and gives consistent
> > > > results. Reading the variable can be undefined behavior
> > > > iff it is a trap representation, otherwise you get
> > > > the unspecified value which is stored there.
> > > > 
> > > > At least this is my reading of the C standard. Compilers
> > > > are not full conformant.
> > > 
> > > Does all this imply that the following is w

[GSoC] Introduction and query on LTO object emmission project

2023-02-24 Thread Peter Lafreniere via Gcc
Hi! I've been interested in compiler development for a while, and would love to
work with any of you as part of GSoC, or even just as a side-project on my own.

I'm an 18 year-old student going into university next year with a passion for 
all
things open source and low level. I consider myself fluent in c, and proficient
with c++, rust, and x86 assembly, but unfamiliar with practical compiler design.
I have done some reading on the theoretical aspects of compilers, however.

While I haven't worked with the GCC community before, I have worked with the 
linux
community and have made several small patches there, so I am familiar with both
email-based workflows and the principles of open-source development. 

This summer, I'm looking for more experience working on larger projects, as well
as getting into real compilers.

Of particular interest to me is the project idea labelled "Bypass assembler when
generating LTO object files." I see that the project was taken last year, but
I can find no sign of any changes committed to trunk 
(`git shortlog --after=2022-01-01 | grep -i -E "lto|assembl(er|y)"` shows 
nothing
related to this project) and no sign of any needed change made in the code.
Is this project still available?

I'm also willing to work on other projects, ideally in the middle/backend, but
currently I have only been experimenting with the gcc/[lto,data]-streamer* 
files.
If anyone has a small or medium sized project idea, please feel free to let me 
know.


I look forward to working with all of you in the future,

Peter Lafreniere




Re: Missed warning (-Wuse-after-free)

2023-02-24 Thread Serge E. Hallyn
On Fri, Feb 24, 2023 at 02:42:32AM +0100, Alex Colomar wrote:
> Hi Serge, Martin,
> 
> On 2/24/23 02:21, Serge E. Hallyn wrote:
> > > Does all this imply that the following is well defined behavior (and shall
> > > print what one would expect)?
> > > 
> > >free(p);
> > > 
> > >(void) &p;  // take the address
> > >// or maybe we should (void) memcmp(&p, &p, sizeof(p)); ?
> > > 
> > >printf("%p\n", p);  // we took previously its address,
> > >// so now it has to hold consistently
> > >// the previous value
> > > 
> > > 
> > > This feels weird.  And a bit of a Schroedinger's pointer.  I'm not 
> > > entirely
> > > convinced, but might be.
> > 
> > Again, p is just an n byte variable which happens to have (one hopes)
> > pointed at a previously malloc'd address.
> > 
> > And I'd argue that pre-C11, this was not confusing, and would not have
> > felt weird to you.
> > 
> > But I am most grateful to you for having brought this to my attention.
> > I may not agree with it and not like it, but it's right there in the
> > spec, so time for me to adjust :)
> 
> I'll try to show why this feels weird to me (even in C89):
> 
> 
> alx@dell7760:~/tmp$ cat pointers.c
> #include 
> #include 
> 
> 
> int
> main(void)
> {
>   char  *p, *q;
> 
>   p = malloc(42);
>   if (p == NULL)
>   exit(1);
> 
>   q = realloc(p, 42);
>   if (q == NULL)
>   exit(1);
> 
>   (void) &p;  // If we remove this, we get -Wuse-after-free

(which I would argue is a bug in the compiler)

>   printf("(%p == %p) = %i\n", p, q, (p == q));
> }
> alx@dell7760:~/tmp$ cc -Wall -Wextra pointers.c  -Wuse-after-free=3
> alx@dell7760:~/tmp$ ./a.out
> (0x5642cd9022a0 == 0x5642cd9022a0) = 1
> 
> 
> This pointers point to different objects (actually, one of them doesn't even
> point to an object anymore), so they can't compare equal, according to both:
> 
> 
> 
> 
> 
> (I believe C89 already had the concept of lifetime well defined as it is
> now, so the object had finished it's lifetime after realloc(3)).
> 
> How can we justify that true, if the pointer don't point to the same object?

Because what's pointed to does not matter.

You are comparing the memory address p, not the contents of the memory address.

By way of analogy, if I do

   mkdir -p /tmp/1/a
   ln -s /tmp/1 /tmp/2
   rm -rf /tmp/1

then /tmp/2 is still a symlink.  'stat /tmp/2' still works and is well
defined.  And if I create a new /tmp/1, then /tmp/2 starts pointing to
that.  Yes, re-useing p like that is a very bad idea, in many cases :)

> And how can we justify a hypothetical false (which compilers don't
> implement), if compilers will really just read the value?  To implement this
> as well defined behavior, it could result in no other than false, and it
> would require heavy overhead for the compilers to detect that the
> seemingly-equal values are indeed different, don't you think?  The easiest
> solution is for the standard to just declare this outlaw, IMO.
> 
> Maybe it could do an exception for printing, that is, reading a pointer is
> not a problem in itself, a long as you don't compare it, but I'm not such an
> expert about this.
> 
> Cheers,
> 
> Alex
> 
> > 
> > -serge
> 
> -- 
> 
> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
> 





Re: Missed warning (-Wuse-after-free)

2023-02-24 Thread Martin Uecker via Gcc
Am Freitag, dem 24.02.2023 um 10:01 -0600 schrieb Serge E. Hallyn:
> On Fri, Feb 24, 2023 at 09:36:45AM +0100, Martin Uecker wrote:
> > Am Donnerstag, dem 23.02.2023 um 19:21 -0600 schrieb Serge E. Hallyn:

...
> > 
> > Yes, but one comment about terminology:. The C standard
> > differentiates between the representation, i.e. the bytes on
> > the stack, and the value.  The representation is converted to
> > a value during lvalue conversion.  For an invalid pointer
> > the representation is indeterminate because it now does not
> > point to a valid object anymore.  So it is not possible to
> > convert the representation to a value during lvalue conversion.
> > In other words, it does not make sense to speak of the value
> > of the pointer anymore.
> 
> I'm sure there are, especially from an implementer's point of view,
> great reasons for this.
> 
> However, as just a user, the "value" of 'void *p' should absolutely
> not be tied to whatever is at that address.

Think about it in this way: The set of possible values for a pointer
is the set of objects that exist at a point in time. If one object
disappears, a pointer can not point to it anymore. So it is not that
the pointer changes, but the set of valid values.

>   I'm given a simple
> linear memory space, under which sits an entirely different view
> obfuscated by page tables, but that doesn't concern me.  if I say
> void *p = -1, then if I print p, then I expect to see that value.

If you store an integer into a pointer (you need a cast), then
this is implementation-defined and may also produce an invalid
pointer.

> 
> Since I'm complaining about standards I'm picking and choosing here,
> but I'll still point at the printf(3) manpage :)  :
> 
>    p  The  void * pointer argument is printed in hexadecimal (as if 
> by %#x
>   or %#lx).

This is valid if the pointer is valid, but if the pointer
is invalid, this is undefined behavior.

In C one not think about pointers as addresses. They
are abstract handles that point to objects, and compilers
do exploit this for optimization.

If you need an address, you can cast it to uintptr_t
(but see below).

> 
> > > I realize C11 appears to have changed that.  I fear that in doing so it
> > > actually risks increasing the confusion about pointers.  IMO it's much
> > > easier to reason about
> > > 
> > >   o = realloc(p, X);
> > > 
> > > (and more baroque constructions) when keeping in mind that o, p, and the
> > > object pointed to by either one are all different things.
> > > 
> > 
> > What did change in C11? As far as I know, the pointer model
> > did not change in C11.
> 
> I haven't looked in more detail, and don't really plan to, but my
> understanding is that the text of:
> 
>   The lifetime of an object is the portion of program execution during which 
> storage is
>   guaranteed to be reserved for it. An object exists, has a constant address, 
> and retains
>   its last-stored value throughout its lifetime. If an object is referred to 
> outside of its
>   lifetime, the behavior is undefined. The value of a pointer becomes 
> indeterminate when
>   the object it points to (or just past) reaches the end of its lifetime.
> 
> (especially the last sentence) was new.

This is not new.

C99 "The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime."

C90: "The value of a pointer that referred to an object
with automatic storage duration that is no longer
guaranteed to be reserved is indeterminate."

and

"The value of a pointer that refers to freed space is
indeterminate."

> Maybe the words "value of a pointer" don't mean what I think they
> mean.  But that's the phrase to which I object.  The n bytes on
> the stack, p, are not changed just because something happened with
> the accounting for the memory at the address represented by that
> value.  If they do, then that's not 'C' any more.

It is not about the bytes of the pointer changing. But if
the object is freed they do not represent a valid pointer
anymore.  There were CPUs that trapped when an invalid
address is loaded, e.g. because the data segment for the
object was removed from the segment tables. So this is a 
rule in portable 'C'  for more than 30 years.

Nowadays compilers exploit the knowledge that the
object is freed. So you can not reliably use such
a pointer. If you do this, your code will be broken on
most modern compilers.


> 
> > > > > Reading an uninitialized value of automatic storage whose
> > > > > address was not taken is undefined behavior, so everything
> > > > > is possible afterwards.
> > > > > 
> > > > > An uninitialized variable whose address was taken has a
> > > > > representation which can represent an unspecified value
> > > > > or a no-value (trap) representation. Reading the
> > > > > representation itself is always ok and gives consistent
> > > > > results. Reading the variable can be undefined behavior
> > > > > iff it is a trap representation, otherwise

gcc-11-20230224 is now available

2023-02-24 Thread GCC Administrator via Gcc
Snapshot gcc-11-20230224 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20230224/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision d832f1977440565f9cb8e154c8ff92c36714d2e8

You'll find:

 gcc-11-20230224.tar.xz   Complete GCC

  SHA256=73ac9c6d8dedf9f160e3a58815485282646dd802b1f561b56f274fc786867917
  SHA1=5a84a87983bd6ae8b1e557cc9d8b05a86ae45e96

Diffs from 11-20230217 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.