Re: struct sockaddr_storage

2023-01-23 Thread Alejandro Colomar via Gcc

Hi Stefan,

On 1/23/23 08:40, Stefan Puiu wrote:

According to strict aliasing rules, if you declare a variable of type 'struct
sockaddr_storage', that's what you get, and trying to access it later as some
other sockaddr_8 is simply not legal.  The compiler may assume those accesses
can't happen, and optimize as it pleases.


Can you detail the "is not legal" part?


I mean that it's Undefined Behavior contraband.


OK, next question. Is this theoretical or practical UB?



Since the functions using this type are not functions that should be inlined, 
since the code is rather large, they are not visible to the compiler, so many of 
the optimizations that this UB enables are not likely to happen.  Translation 
Unit (TU) boundaries are what keeps most UB invokations not be really dangerous.


Also, glibc seems to be using a GCC attribute (transparent_union) to make the 
code avoid UB even if it were inlined, so if you use glibc, you're fine.  If 
you're using some smaller libc with a less capable compiler, or maybe C++, you 
are less lucky, but TU boundaries will probably still save your day.



People check
documentation about how to write code today, I think.


I'm fine documenting how to do it today.  But before changing the documentation, 
I'd like to take some time to reflect on what can we do to fix the standard so 
that we don't have this semi-broken state forever.  When we have a clear idea of 
what we can do to fix the implementation and hopefully the standard long-term, 
possibly keeping source code the same, we can do a better recommendation for 
programmers.


Today, you can do 2 things:

-  You don't care about UB, and would like that C had always been K&R C, and GCC 
just makes it work.  Then use `sockaddr_storage`.  It will just work.  When it 
stops working, you can blame the compiler and libc for optimizing way too much.


-  You care a lot about UB.  Then write your own union, as all the `sockaddr` 
interface should have been designed from the ground up.  That's what unions are for.


Which should we recommend?  That's my problem.

I don't want to be documenting the latter, because it's non-standard, and it's 
still likely to do it invoking UB in a different way, because it's a difficult 
part of the language, and when you roll your own, you're likely to make accidents.


So, ideally, I'd like to document the former, but for that, I'd like to make 
sure that it will work forever, since otherwise we'd be blamed when somebody's 
code is compiled in a platform with some combination of libc, compiler, and 
phase of the moon, that makes the UB become non-theoretical.


I think we can fix the definition of `sockaddr_storage` to have defined 
behavior, with the changes I'm discussing with Bastien, so I guess we'll 
document the former.



Will code break in practice?


Well, it depends on how much compilers advance.  Here's some interesting 
experiment:




That code plays with 2 pointers to the same area, one to double and
one to int, so I don't think it's that similar to the sockaddr
situation. At least for struct sockaddr, the sa_family field is the
same for all struct sockaddr_* variants. Also, in practical terms, I
don't think any compiler optimization that breaks socket APIs (and, if
I recall correctly, there are instances of this pattern in the kernel
as well) is going to be an easy sell. It's possible, but realistically
speaking, I don't think it's going to happen.


The common initial sequence of structures is only allowed if the structures form 
part of a union (which is why to avoid UB you need a union; and still, you need 
to make sure you don't invoke UB in a different way).







I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get
them, you can only kill them with garlic.


OK, but not all theoretical issues have practical implications. Is
there code that can show UB in practical terms with struct
sockaddr_storage today? Like Eric mentioned in another thread, does
UBSan complain about code using struct sockaddr_storage?


It's unlikely.  But I can't promise it will be safe under some random 
combination of compiler and library, and depends also on what you do in your 
code, which will affect compiler optimizations.


Cheers,

Alex

--



OpenPGP_signature
Description: OpenPGP digital signature


Re: struct sockaddr_storage

2023-01-23 Thread Richard Biener via Gcc
On Mon, Jan 23, 2023 at 5:04 PM Alejandro Colomar via Gcc
 wrote:
>
> Hi Stefan,
>
> On 1/23/23 08:40, Stefan Puiu wrote:
>  According to strict aliasing rules, if you declare a variable of type 
>  'struct
>  sockaddr_storage', that's what you get, and trying to access it later as 
>  some
>  other sockaddr_8 is simply not legal.  The compiler may assume those 
>  accesses
>  can't happen, and optimize as it pleases.
> >>>
> >>> Can you detail the "is not legal" part?
> >>
> >> I mean that it's Undefined Behavior contraband.
> >
> > OK, next question. Is this theoretical or practical UB?
>
>
> Since the functions using this type are not functions that should be inlined,
> since the code is rather large, they are not visible to the compiler, so many 
> of
> the optimizations that this UB enables are not likely to happen.  Translation
> Unit (TU) boundaries are what keeps most UB invokations not be really 
> dangerous.
>
> Also, glibc seems to be using a GCC attribute (transparent_union) to make the
> code avoid UB even if it were inlined, so if you use glibc, you're fine.  If
> you're using some smaller libc with a less capable compiler, or maybe C++, you
> are less lucky, but TU boundaries will probably still save your day.
>
> > People check
> > documentation about how to write code today, I think.
>
> I'm fine documenting how to do it today.  But before changing the 
> documentation,
> I'd like to take some time to reflect on what can we do to fix the standard so
> that we don't have this semi-broken state forever.  When we have a clear idea 
> of
> what we can do to fix the implementation and hopefully the standard long-term,
> possibly keeping source code the same, we can do a better recommendation for
> programmers.
>
> Today, you can do 2 things:
>
> -  You don't care about UB, and would like that C had always been K&R C, and 
> GCC
> just makes it work.  Then use `sockaddr_storage`.  It will just work.  When it
> stops working, you can blame the compiler and libc for optimizing way too 
> much.
>
> -  You care a lot about UB.  Then write your own union, as all the `sockaddr`
> interface should have been designed from the ground up.  That's what unions 
> are for.
>
> Which should we recommend?  That's my problem.
>
> I don't want to be documenting the latter, because it's non-standard, and it's
> still likely to do it invoking UB in a different way, because it's a difficult
> part of the language, and when you roll your own, you're likely to make 
> accidents.
>
> So, ideally, I'd like to document the former, but for that, I'd like to make
> sure that it will work forever, since otherwise we'd be blamed when somebody's
> code is compiled in a platform with some combination of libc, compiler, and
> phase of the moon, that makes the UB become non-theoretical.
>
> I think we can fix the definition of `sockaddr_storage` to have defined
> behavior, with the changes I'm discussing with Bastien, so I guess we'll
> document the former.
>
> >>> Will code break in practice?
> >>
> >> Well, it depends on how much compilers advance.  Here's some interesting 
> >> experiment:
> >>
> >> 
> >
> > That code plays with 2 pointers to the same area, one to double and
> > one to int, so I don't think it's that similar to the sockaddr
> > situation. At least for struct sockaddr, the sa_family field is the
> > same for all struct sockaddr_* variants. Also, in practical terms, I
> > don't think any compiler optimization that breaks socket APIs (and, if
> > I recall correctly, there are instances of this pattern in the kernel
> > as well) is going to be an easy sell. It's possible, but realistically
> > speaking, I don't think it's going to happen.
>
> The common initial sequence of structures is only allowed if the structures 
> form
> part of a union (which is why to avoid UB you need a union; and still, you 
> need
> to make sure you don't invoke UB in a different way).
> 

GCC only allows it if the union is visible as part of the access, that
is, it allows it
under its rule of allowing punning for union accesses and not specially because
of the initial sequence rule.  So

 u.a.x = 1;
 ... = u.b.x;

is allowed but

  struct A *p = &u.a;
  p->x = 1;
  struct B *q = &u.b;
  ... = q->x;

is UB with GCC if struct A and B are the union members with a common
initial sequence.

Richard.

> >
> >>
> >> I wouldn't rely on Undefined Behavior not causing nasal demons.  When you 
> >> get
> >> them, you can only kill them with garlic.
> >
> > OK, but not all theoretical issues have practical implications. Is
> > there code that can show UB in practical terms with struct
> > sockaddr_storage today? Like Eric mentioned in another thread, does
> > UBSan complain about code using struct sockaddr_storage?
>
> It's unlikely.  But I can't promise it will be safe under some random
> combination

Re: struct sockaddr_storage

2023-01-23 Thread Jakub Jelinek via Gcc
On Mon, Jan 23, 2023 at 05:03:00PM +0100, Alejandro Colomar via Gcc wrote:
> Hi Stefan,
> 
> On 1/23/23 08:40, Stefan Puiu wrote:
> > > > > According to strict aliasing rules, if you declare a variable of type 
> > > > > 'struct
> > > > > sockaddr_storage', that's what you get, and trying to access it later 
> > > > > as some
> > > > > other sockaddr_8 is simply not legal.  The compiler may assume those 
> > > > > accesses
> > > > > can't happen, and optimize as it pleases.
> > > > 
> > > > Can you detail the "is not legal" part?
> > > 
> > > I mean that it's Undefined Behavior contraband.
> > 
> > OK, next question. Is this theoretical or practical UB?
> 
> 
> Since the functions using this type are not functions that should be
> inlined, since the code is rather large, they are not visible to the
> compiler, so many of the optimizations that this UB enables are not likely
> to happen.  Translation Unit (TU) boundaries are what keeps most UB
> invokations not be really dangerous.
> 
> Also, glibc seems to be using a GCC attribute (transparent_union) to make
> the code avoid UB even if it were inlined, so if you use glibc, you're fine.
> If you're using some smaller libc with a less capable compiler, or maybe
> C++, you are less lucky, but TU boundaries will probably still save your
> day.

Please see transparent_union documentation in GCC documentation.
E.g. 
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute
transparent_union doesn't change anything regarding type punning, it is
solely about function arguments, how arguments of that type are passed
(as first union member) and that no casts to the union are needed from
the member types.
And, with LTO TU boundaries are lost, inlining or other IPA optimizations
(including say modref) work in between TUs.

Jakub



Re: [PATCH 1/4] libbacktrace: change all pc related variables to uintptr_t

2023-01-23 Thread Björn Schäpers

Am 20.01.2023 um 23:25 schrieb Ian Lance Taylor:

On Fri, Jan 20, 2023 at 2:54 AM Björn Schäpers  wrote:


From: Björn Schäpers 

It's the right thing to do, since the PC shouldn't go out of the
uintptr_t domain, and in backtrace_pcinfo the pc is uintptr_t.
This is a preparation for a following patch.

Tested on x86_64-linux and i686-w64-mingw32.


Thanks.  Committed like so, with some additional tweaks.

For future reference, when pinging a patch, please reply to the
original patch to maintain the thread.  Or at least mention the
original patch.  It was still on my list, I just hadn't gotten to it.
Thanks.

Ian



Thanks for the commit, and sorry for the repost. Because of a fault of my own I 
had no copy in my mailbox and thus couldn't reply to the first batch.


Regards,
Björn.



Re: [PATCH 2/4] libbacktrace: detect executable path on windows

2023-01-23 Thread Ian Lance Taylor via Gcc
On Fri, Jan 20, 2023 at 2:56 AM Björn Schäpers  wrote:
>
> From: Björn Schäpers 
>
> This is actually needed so that libstdc++'s  implementation
> to be able to work on windows.
>
> Tested on x86_64-linux and i686-w64-mingw32.
>
> -- >8 --
>
> * configure.ac: Add a check for windows.h.
> * configure, config.h.in: Regenerate.
> * fileline.c: Add windows_get_executable_path.
> * fileline.c (fileline_initialiez): Add a pass using
> windows_get_executable_path.
>
> +#ifdef HAVE_WINDOWS_H
> +
> +static char *
> +windows_get_executable_path (char *buf, backtrace_error_callback 
> error_callback,
> +void *data)
> +{
> +  if (GetModuleFileNameA (NULL, buf, MAX_PATH - 1) == 0)
> +{
> +  error_callback (data,
> + "could not get the filename of the current executable",
> + (int) GetLastError ());
> +  return NULL;
> +}
> +  return buf;
> +}

Thanks, but this seems incomplete.  The docs for GetModuleFileNameA
say that if the pathname is too long to fit into the buffer it returns
the size of the buffer and sets the error to
ERROR_INSUFFICIENT_BUFFER.  It seems to me that in that case we should
allocate a larger buffer and try again.  And, in general, it will be
simpler if we always allocate the buffer, as macho_get_executable_path
does.  Unfortunately it appears that Windows does not provide a way to
ask for the required length.  On Windows it seems that MAX_PATH is not
a true limit, as an extended length path may be up to 32767 bytes.

So probably something like (untested)

static char *
windows_get_executable_path (struct backtrace_state *state,
 backtrace_error_callback error_callback,
 void *data)
{
  uint32_t len;
  char *buf;

  len = MAX_PATH;
  while (1)
{
  uint32_t got;

  name = (char *) backtrace_alloc (state, len, error_callback, data);
  if (name == NULL)
return NULL;
  got = GetModuleFileNameA (NULL, name, len);
  if (got < len - 1) /* -1 because NULB is not counted */
return name;
  backtrace_free (state, name, len, error_callback, data);
  if (GetLastError () != ERROR_INSUFFICIENT_BUFFER)
return NULL;
  len *= 2;
}
}

Ian