aeubanks added inline comments.
================
Comment at: clang/include/clang/Basic/AttrDocs.td:1416-1417
+not significant. This allows global constants with the same contents to be
+merged. This can break global pointer identity, i.e. two different globals have
+the same address.
+
----------------
rnk wrote:
> erichkeane wrote:
> > aeubanks wrote:
> > > erichkeane wrote:
> > > > aeubanks wrote:
> > > > > aaron.ballman wrote:
> > > > > > aeubanks wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > aeubanks wrote:
> > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > What happens for tentative definitions where the value
> > > > > > > > > > isn't known? e.g.,
> > > > > > > > > > ```
> > > > > > > > > > [[clang::unnamed_addr]] int i1, i2;
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > What happens if the types are similar but not the same?
> > > > > > > > > > ```
> > > > > > > > > > [[clang::unnamed_addr]] signed int i1 = 32;
> > > > > > > > > > [[clang::unnamed_addr]] unsigned int i2 = 32;
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > Should we diagnose taking the address of such an attributed
> > > > > > > > > > variable so users have some hope of spotting the
> > > > > > > > > > non-conforming situations?
> > > > > > > > > >
> > > > > > > > > > Does this attribute have impacts across translation unit
> > > > > > > > > > boundaries (perhaps only when doing LTO) or only within a
> > > > > > > > > > single TU?
> > > > > > > > > >
> > > > > > > > > > What does this attribute do in C++ in the presence of
> > > > > > > > > > constructors and destructors? e.g.,
> > > > > > > > > > ```
> > > > > > > > > > struct S {
> > > > > > > > > > S();
> > > > > > > > > > ~S();
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > [[clang::unnamed_addr]] S s1, s2; // Are these merged and
> > > > > > > > > > there's only one ctor/dtor call?
> > > > > > > > > > ```
> > > > > > > > > globals are only mergeable if they're known to be constant
> > > > > > > > > and have the same value/size. this can be done at compile
> > > > > > > > > time only if the optimizer can see the constant values, or at
> > > > > > > > > link time
> > > > > > > > >
> > > > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > > > >
> > > > > > > > > but yeah that does imply that we should warn when the
> > > > > > > > > attribute is used on non const, non-POD globals. I'll update
> > > > > > > > > this patch to do that
> > > > > > > > >
> > > > > > > > > as mentioned in the description, we actually do want to take
> > > > > > > > > the address of these globals for table-driven parsing, but we
> > > > > > > > > don't care about identity equality
> > > > > > > > > globals are only mergeable if they're known to be constant
> > > > > > > > > and have the same value/size. this can be done at compile
> > > > > > > > > time only if the optimizer can see the constant values, or at
> > > > > > > > > link time
> > > > > > > > >
> > > > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > > >
> > > > > > > > Ahhhh that's good to know. So I assume we *will* merge these?
> > > > > > > >
> > > > > > > > ```
> > > > > > > > struct S {
> > > > > > > > int i, j;
> > > > > > > > float f;
> > > > > > > > };
> > > > > > > >
> > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > > > ```
> > > > > > > >
> > > > > > > > > but yeah that does imply that we should warn when the
> > > > > > > > > attribute is used on non const, non-POD globals. I'll update
> > > > > > > > > this patch to do that
> > > > > > > >
> > > > > > > > Thank you, I think that will be more user-friendly
> > > > > > > >
> > > > > > > > > as mentioned in the description, we actually do want to take
> > > > > > > > > the address of these globals for table-driven parsing, but we
> > > > > > > > > don't care about identity equality
> > > > > > > >
> > > > > > > > Yeah, I still wonder if we want to diagnose just the same -- if
> > > > > > > > the address is never taken, there's not really a way to notice
> > > > > > > > the optimization, but if the address is taken, you basically
> > > > > > > > get UB (and I think we should explicitly document it as such).
> > > > > > > > Given how easy it is to accidentally take the address of
> > > > > > > > something (like via a reference in C++), I think we should warn
> > > > > > > > by default, but still have a warning group for folks who want
> > > > > > > > to live life dangerously.
> > > > > > > > > globals are only mergeable if they're known to be constant
> > > > > > > > > and have the same value/size. this can be done at compile
> > > > > > > > > time only if the optimizer can see the constant values, or at
> > > > > > > > > link time
> > > > > > > > >
> > > > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > > >
> > > > > > > > Ahhhh that's good to know. So I assume we *will* merge these?
> > > > > > > >
> > > > > > > > ```
> > > > > > > > struct S {
> > > > > > > > int i, j;
> > > > > > > > float f;
> > > > > > > > };
> > > > > > > >
> > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > > > ```
> > > > > > > yeah those are merged even just by clang
> > > > > > >
> > > > > > > ```
> > > > > > > struct S {
> > > > > > > int i, j;
> > > > > > > float f;
> > > > > > > };
> > > > > > >
> > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > >
> > > > > > > const void * f1() {
> > > > > > > return &s1;
> > > > > > > }
> > > > > > >
> > > > > > > const void * f2() {
> > > > > > > return &s2;
> > > > > > > }
> > > > > > >
> > > > > > > const void * f3() {
> > > > > > > return &s3;
> > > > > > > }
> > > > > > >
> > > > > > > $ ./build/rel/bin/clang++ -S -emit-llvm -o - -O2 /tmp/a.cc
> > > > > > > ```
> > > > > > > >
> > > > > > > > > but yeah that does imply that we should warn when the
> > > > > > > > > attribute is used on non const, non-POD globals. I'll update
> > > > > > > > > this patch to do that
> > > > > > > >
> > > > > > > > Thank you, I think that will be more user-friendly
> > > > > > > >
> > > > > > > > > as mentioned in the description, we actually do want to take
> > > > > > > > > the address of these globals for table-driven parsing, but we
> > > > > > > > > don't care about identity equality
> > > > > > > >
> > > > > > > > Yeah, I still wonder if we want to diagnose just the same -- if
> > > > > > > > the address is never taken, there's not really a way to notice
> > > > > > > > the optimization, but if the address is taken, you basically
> > > > > > > > get UB (and I think we should explicitly document it as such).
> > > > > > > > Given how easy it is to accidentally take the address of
> > > > > > > > something (like via a reference in C++), I think we should warn
> > > > > > > > by default, but still have a warning group for folks who want
> > > > > > > > to live life dangerously.
> > > > > > >
> > > > > > > I don't understand how there would be any UB. Especially if the
> > > > > > > user only dereferences them, and isn't comparing pointers, which
> > > > > > > is the requested use case.
> > > > > > > yeah those are merged even just by clang
> > > > > >
> > > > > > Nice, thank you for the confirmation!
> > > > > >
> > > > > > > I don't understand how there would be any UB. Especially if the
> > > > > > > user only dereferences them, and isn't comparing pointers, which
> > > > > > > is the requested use case.
> > > > > >
> > > > > > That's just it -- nothing prevents the user from taking the address
> > > > > > and comparing the pointers, which is no longer defined behavior
> > > > > > with this attribute. It would require a static analysis check to
> > > > > > catch this problem unless the compiler statically warns on taking
> > > > > > the address in the first place (IMO, we shouldn't assume users will
> > > > > > use the attribute properly and thus need no help to do so). I was
> > > > > > also thinking about things like accidental sharing across thread
> > > > > > boundaries -- but perhaps that's fine because the data is constant.
> > > > > > I was also thinking that this potentially breaks `restrict`
> > > > > > semantics but on reflection... that seems almost intentional given
> > > > > > the goal of the attribute. But things along these lines are what
> > > > > > have me worried -- the language assumes unique locations for
> > > > > > objects, so I expect there's going to be fallout when object
> > > > > > locations are no longer unique. If we can remove sharp edges for
> > > > > > users without compromising the utility of the attribute, I think
> > > > > > that's beneficial. Or are you saying that warning like this would
> > > > > > basically compromise the utility?
> > > > > > > I don't understand how there would be any UB. Especially if the
> > > > > > > user only dereferences them, and isn't comparing pointers, which
> > > > > > > is the requested use case.
> > > > > >
> > > > > > That's just it -- nothing prevents the user from taking the address
> > > > > > and comparing the pointers, which is no longer defined behavior
> > > > > > with this attribute. It would require a static analysis check to
> > > > > > catch this problem unless the compiler statically warns on taking
> > > > > > the address in the first place (IMO, we shouldn't assume users will
> > > > > > use the attribute properly and thus need no help to do so). I was
> > > > > > also thinking about things like accidental sharing across thread
> > > > > > boundaries -- but perhaps that's fine because the data is constant.
> > > > > > I was also thinking that this potentially breaks `restrict`
> > > > > > semantics but on reflection... that seems almost intentional given
> > > > > > the goal of the attribute. But things along these lines are what
> > > > > > have me worried -- the language assumes unique locations for
> > > > > > objects, so I expect there's going to be fallout when object
> > > > > > locations are no longer unique. If we can remove sharp edges for
> > > > > > users without compromising the utility of the attribute, I think
> > > > > > that's beneficial. Or are you saying that warning like this would
> > > > > > basically compromise the utility?
> > > > >
> > > > > when you say "undefined behavior" do you mean "it's unspecified what
> > > > > happens" or literally the C/C++ "undefined behavior" where the
> > > > > compiler can assume it doesn't happen?
> > > > >
> > > > > I don't think there's any UB in the C/C++ "undefined behavior" sense,
> > > > > we're just dropping a C/C++ guarantee of unique pointer identity for
> > > > > certain globals.
> > > > >
> > > > > Yes I believe the warning would compromise the utility since the
> > > > > underlying request behind this is a case where the user explicitly
> > > > > wants to take the address of these globals for table driven parsing
> > > > > but does not care about unique global identity. i.e. it's fine if we
> > > > > have duplicate addresses in the table as long as each entry points to
> > > > > the proper data.
> > > > I think this IS causing undefined behavior, any program that assumes
> > > > the addresses aren't the same (such as inserting addresses into a map,
> > > > explicitly destructing/initializing/etc), or are comparing addresses
> > > > are now exhibiting UB (in the purest of C++ senses). It isn't the
> > > > 'taking the address' that is UB, it is comparing them, but
> > > > unfortunately we don't have a good way to diagnose THAT. I believe
> > > > what Aaron is getting at is that the taking of the addresses should be
> > > > diagnosed, particularly if we end up taking said address less-obviously.
> > > >
> > > > It DOES have to be a Static Analysis type diagnostic however, since I
> > > > don't think it is accurate enough.
> > > >
> > > perhaps I'm arguing too literally (or am just wrong), but even if the
> > > program behaves incorrectly due to comparisons of addresses now giving
> > > different results, that's not necessarily UB. even if a program were
> > > using pointers as a key into a map and two globals that previously were
> > > separate entries are now one, that's not necessarily UB, that's just a
> > > program behaving incorrectly.
> > >
> > > unless there's a specific C/C++ rule that you have in mind that I'm
> > > missing?
> > >
> > > I'm happy to add a warning that we can turn off internally if people
> > > agree that taking the address of an `unnamed_addr` global is dangerous in
> > > general, not sure if that should be default on or off
> > The simplest one I can think of is something like a :
> >
> > ```swap_containers(global1, global2); //swap that takes by ref, but doesn't
> > check addresses```
> >
> > Warnings are obviously trivially disable-able, and I'd be fine with making
> > it a really fine-grained warning group, but I think the dangers of this
> > attribute are significant. MANY operations can have a precondition of "my
> > parameters are different objects" that this can now cause you to trivially
> > violate in a way that was never a problem before (since we give them
> > different names!).
> I think, to Aaron's earlier question, warning on taking the address of such a
> global kind of defeats the point of the attribute. If you *don't* take the
> address of the constant global, LLVM will already mark it with `unnamed_addr`
> or `local_unnamed_addr`, and the attribute is usually unnecessary. You would
> only reach for this attribute if you are already taking the address of the
> global.
>
> And, to try to elaborate more on the UB situation, I believe LLVM does
> perform optimizations like `&gv1 == &gv2` -> `false`:
> https://gcc.godbolt.org/z/3f5qd8vKr
> This does present a soundness issue, but it's already a bit of an existing
> issue, since there are other ways to make globals alias, such as aliases.
ah I misunderstood how ICF worked in regards to `local_unnamed_addr`. right now
we will ICF globals that don't have their address taken across any translation
unit.
```
$ cat /tmp/a.c
extern const int h;
extern const int g;
const int h = 42;
const int g = 42;
int f() {
return h;
}
$ cat /tmp/b.c
extern const int g;
int f();
int main() {
return g + f();
}
$ clang -o /tmp/a /tmp/a.c /tmp/b.c -O2 -fdata-sections -ffunction-sections
-fuse-ld=lld -Wl,--icf=safe -Wl,--print-icf-sections -Wl,--verbose
...
ld.lld: ICF needed 2 iterations
selected section /usr/local/google/home/aeubanks/tmp/a-da430f.o:(.rodata.h)
removing identical section
/usr/local/google/home/aeubanks/tmp/a-da430f.o:(.rodata.g)
...
```
so yeah +1 to @rnk's comment. this change would only benefit globals that have
their address taken.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D158223/new/
https://reviews.llvm.org/D158223
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits