aeubanks added inline comments.
================ Comment at: clang/include/clang/Basic/AttrDocs.td:1416-1417 +not significant. This allows global constants with the same contents to be +merged. This can break global pointer identity, i.e. two different globals have +the same address. + ---------------- aaron.ballman wrote: > aeubanks wrote: > > rnk wrote: > > > aaron.ballman wrote: > > > > aeubanks wrote: > > > > > rnk wrote: > > > > > > erichkeane wrote: > > > > > > > aeubanks wrote: > > > > > > > > erichkeane wrote: > > > > > > > > > aeubanks wrote: > > > > > > > > > > aaron.ballman wrote: > > > > > > > > > > > aeubanks wrote: > > > > > > > > > > > > aaron.ballman wrote: > > > > > > > > > > > > > aeubanks wrote: > > > > > > > > > > > > > > aaron.ballman wrote: > > > > > > > > > > > > > > > What happens for tentative definitions where the > > > > > > > > > > > > > > > value isn't known? e.g., > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > [[clang::unnamed_addr]] int i1, i2; > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What happens if the types are similar but not the > > > > > > > > > > > > > > > same? > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > [[clang::unnamed_addr]] signed int i1 = 32; > > > > > > > > > > > > > > > [[clang::unnamed_addr]] unsigned int i2 = 32; > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Should we diagnose taking the address of such an > > > > > > > > > > > > > > > attributed variable so users have some hope of > > > > > > > > > > > > > > > spotting the non-conforming situations? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Does this attribute have impacts across > > > > > > > > > > > > > > > translation unit boundaries (perhaps only when > > > > > > > > > > > > > > > doing LTO) or only within a single TU? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What does this attribute do in C++ in the > > > > > > > > > > > > > > > presence of constructors and destructors? e.g., > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > struct S { > > > > > > > > > > > > > > > S(); > > > > > > > > > > > > > > > ~S(); > > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [[clang::unnamed_addr]] S s1, s2; // Are these > > > > > > > > > > > > > > > merged and there's only one ctor/dtor call? > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > globals are only mergeable if they're known to be > > > > > > > > > > > > > > constant and have the same value/size. this can be > > > > > > > > > > > > > > done at compile time only if the optimizer can see > > > > > > > > > > > > > > the constant values, or at link time > > > > > > > > > > > > > > > > > > > > > > > > > > > > so nothing would happen in any of the cases you've > > > > > > > > > > > > > > given. > > > > > > > > > > > > > > > > > > > > > > > > > > > > but yeah that does imply that we should warn when > > > > > > > > > > > > > > the attribute is used on non const, non-POD > > > > > > > > > > > > > > globals. I'll update this patch to do that > > > > > > > > > > > > > > > > > > > > > > > > > > > > as mentioned in the description, we actually do > > > > > > > > > > > > > > want to take the address of these globals for > > > > > > > > > > > > > > table-driven parsing, but we don't care about > > > > > > > > > > > > > > identity equality > > > > > > > > > > > > > > globals are only mergeable if they're known to be > > > > > > > > > > > > > > constant and have the same value/size. this can be > > > > > > > > > > > > > > done at compile time only if the optimizer can see > > > > > > > > > > > > > > the constant values, or at link time > > > > > > > > > > > > > > > > > > > > > > > > > > > > so nothing would happen in any of the cases you've > > > > > > > > > > > > > > given. > > > > > > > > > > > > > > > > > > > > > > > > > > Ahhhh that's good to know. So I assume we *will* > > > > > > > > > > > > > merge these? > > > > > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > struct S { > > > > > > > > > > > > > int i, j; > > > > > > > > > > > > > float f; > > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f }; > > > > > > > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f }; > > > > > > > > > > > > > [[clang::unnamed_addr]] const S s3 = s2; > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > but yeah that does imply that we should warn when > > > > > > > > > > > > > > the attribute is used on non const, non-POD > > > > > > > > > > > > > > globals. I'll update this patch to do that > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you, I think that will be more user-friendly > > > > > > > > > > > > > > > > > > > > > > > > > > > as mentioned in the description, we actually do > > > > > > > > > > > > > > want to take the address of these globals for > > > > > > > > > > > > > > table-driven parsing, but we don't care about > > > > > > > > > > > > > > identity equality > > > > > > > > > > > > > > > > > > > > > > > > > > Yeah, I still wonder if we want to diagnose just the > > > > > > > > > > > > > same -- if the address is never taken, there's not > > > > > > > > > > > > > really a way to notice the optimization, but if the > > > > > > > > > > > > > address is taken, you basically get UB (and I think > > > > > > > > > > > > > we should explicitly document it as such). Given how > > > > > > > > > > > > > easy it is to accidentally take the address of > > > > > > > > > > > > > something (like via a reference in C++), I think we > > > > > > > > > > > > > should warn by default, but still have a warning > > > > > > > > > > > > > group for folks who want to live life dangerously. > > > > > > > > > > > > > > globals are only mergeable if they're known to be > > > > > > > > > > > > > > constant and have the same value/size. this can be > > > > > > > > > > > > > > done at compile time only if the optimizer can see > > > > > > > > > > > > > > the constant values, or at link time > > > > > > > > > > > > > > > > > > > > > > > > > > > > so nothing would happen in any of the cases you've > > > > > > > > > > > > > > given. > > > > > > > > > > > > > > > > > > > > > > > > > > Ahhhh that's good to know. So I assume we *will* > > > > > > > > > > > > > merge these? > > > > > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > > struct S { > > > > > > > > > > > > > int i, j; > > > > > > > > > > > > > float f; > > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f }; > > > > > > > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f }; > > > > > > > > > > > > > [[clang::unnamed_addr]] const S s3 = s2; > > > > > > > > > > > > > ``` > > > > > > > > > > > > yeah those are merged even just by clang > > > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > > > struct S { > > > > > > > > > > > > int i, j; > > > > > > > > > > > > float f; > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f }; > > > > > > > > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f }; > > > > > > > > > > > > [[clang::unnamed_addr]] const S s3 = s2; > > > > > > > > > > > > > > > > > > > > > > > > const void * f1() { > > > > > > > > > > > > return &s1; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > const void * f2() { > > > > > > > > > > > > return &s2; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > const void * f3() { > > > > > > > > > > > > return &s3; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > $ ./build/rel/bin/clang++ -S -emit-llvm -o - -O2 > > > > > > > > > > > > /tmp/a.cc > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > but yeah that does imply that we should warn when > > > > > > > > > > > > > > the attribute is used on non const, non-POD > > > > > > > > > > > > > > globals. I'll update this patch to do that > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you, I think that will be more user-friendly > > > > > > > > > > > > > > > > > > > > > > > > > > > as mentioned in the description, we actually do > > > > > > > > > > > > > > want to take the address of these globals for > > > > > > > > > > > > > > table-driven parsing, but we don't care about > > > > > > > > > > > > > > identity equality > > > > > > > > > > > > > > > > > > > > > > > > > > Yeah, I still wonder if we want to diagnose just the > > > > > > > > > > > > > same -- if the address is never taken, there's not > > > > > > > > > > > > > really a way to notice the optimization, but if the > > > > > > > > > > > > > address is taken, you basically get UB (and I think > > > > > > > > > > > > > we should explicitly document it as such). Given how > > > > > > > > > > > > > easy it is to accidentally take the address of > > > > > > > > > > > > > something (like via a reference in C++), I think we > > > > > > > > > > > > > should warn by default, but still have a warning > > > > > > > > > > > > > group for folks who want to live life dangerously. > > > > > > > > > > > > > > > > > > > > > > > > I don't understand how there would be any UB. > > > > > > > > > > > > Especially if the user only dereferences them, and > > > > > > > > > > > > isn't comparing pointers, which is the requested use > > > > > > > > > > > > case. > > > > > > > > > > > > yeah those are merged even just by clang > > > > > > > > > > > > > > > > > > > > > > Nice, thank you for the confirmation! > > > > > > > > > > > > > > > > > > > > > > > I don't understand how there would be any UB. > > > > > > > > > > > > Especially if the user only dereferences them, and > > > > > > > > > > > > isn't comparing pointers, which is the requested use > > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > That's just it -- nothing prevents the user from taking > > > > > > > > > > > the address and comparing the pointers, which is no > > > > > > > > > > > longer defined behavior with this attribute. It would > > > > > > > > > > > require a static analysis check to catch this problem > > > > > > > > > > > unless the compiler statically warns on taking the > > > > > > > > > > > address in the first place (IMO, we shouldn't assume > > > > > > > > > > > users will use the attribute properly and thus need no > > > > > > > > > > > help to do so). I was also thinking about things like > > > > > > > > > > > accidental sharing across thread boundaries -- but > > > > > > > > > > > perhaps that's fine because the data is constant. I was > > > > > > > > > > > also thinking that this potentially breaks `restrict` > > > > > > > > > > > semantics but on reflection... that seems almost > > > > > > > > > > > intentional given the goal of the attribute. But things > > > > > > > > > > > along these lines are what have me worried -- the > > > > > > > > > > > language assumes unique locations for objects, so I > > > > > > > > > > > expect there's going to be fallout when object locations > > > > > > > > > > > are no longer unique. If we can remove sharp edges for > > > > > > > > > > > users without compromising the utility of the attribute, > > > > > > > > > > > I think that's beneficial. Or are you saying that warning > > > > > > > > > > > like this would basically compromise the utility? > > > > > > > > > > > > I don't understand how there would be any UB. > > > > > > > > > > > > Especially if the user only dereferences them, and > > > > > > > > > > > > isn't comparing pointers, which is the requested use > > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > That's just it -- nothing prevents the user from taking > > > > > > > > > > > the address and comparing the pointers, which is no > > > > > > > > > > > longer defined behavior with this attribute. It would > > > > > > > > > > > require a static analysis check to catch this problem > > > > > > > > > > > unless the compiler statically warns on taking the > > > > > > > > > > > address in the first place (IMO, we shouldn't assume > > > > > > > > > > > users will use the attribute properly and thus need no > > > > > > > > > > > help to do so). I was also thinking about things like > > > > > > > > > > > accidental sharing across thread boundaries -- but > > > > > > > > > > > perhaps that's fine because the data is constant. I was > > > > > > > > > > > also thinking that this potentially breaks `restrict` > > > > > > > > > > > semantics but on reflection... that seems almost > > > > > > > > > > > intentional given the goal of the attribute. But things > > > > > > > > > > > along these lines are what have me worried -- the > > > > > > > > > > > language assumes unique locations for objects, so I > > > > > > > > > > > expect there's going to be fallout when object locations > > > > > > > > > > > are no longer unique. If we can remove sharp edges for > > > > > > > > > > > users without compromising the utility of the attribute, > > > > > > > > > > > I think that's beneficial. Or are you saying that warning > > > > > > > > > > > like this would basically compromise the utility? > > > > > > > > > > > > > > > > > > > > when you say "undefined behavior" do you mean "it's > > > > > > > > > > unspecified what happens" or literally the C/C++ "undefined > > > > > > > > > > behavior" where the compiler can assume it doesn't happen? > > > > > > > > > > > > > > > > > > > > I don't think there's any UB in the C/C++ "undefined > > > > > > > > > > behavior" sense, we're just dropping a C/C++ guarantee of > > > > > > > > > > unique pointer identity for certain globals. > > > > > > > > > > > > > > > > > > > > Yes I believe the warning would compromise the utility > > > > > > > > > > since the underlying request behind this is a case where > > > > > > > > > > the user explicitly wants to take the address of these > > > > > > > > > > globals for table driven parsing but does not care about > > > > > > > > > > unique global identity. i.e. it's fine if we have duplicate > > > > > > > > > > addresses in the table as long as each entry points to the > > > > > > > > > > proper data. > > > > > > > > > I think this IS causing undefined behavior, any program that > > > > > > > > > assumes the addresses aren't the same (such as inserting > > > > > > > > > addresses into a map, explicitly > > > > > > > > > destructing/initializing/etc), or are comparing addresses are > > > > > > > > > now exhibiting UB (in the purest of C++ senses). It isn't > > > > > > > > > the 'taking the address' that is UB, it is comparing them, > > > > > > > > > but unfortunately we don't have a good way to diagnose THAT. > > > > > > > > > I believe what Aaron is getting at is that the taking of the > > > > > > > > > addresses should be diagnosed, particularly if we end up > > > > > > > > > taking said address less-obviously. > > > > > > > > > > > > > > > > > > It DOES have to be a Static Analysis type diagnostic however, > > > > > > > > > since I don't think it is accurate enough. > > > > > > > > > > > > > > > > > perhaps I'm arguing too literally (or am just wrong), but even > > > > > > > > if the program behaves incorrectly due to comparisons of > > > > > > > > addresses now giving different results, that's not necessarily > > > > > > > > UB. even if a program were using pointers as a key into a map > > > > > > > > and two globals that previously were separate entries are now > > > > > > > > one, that's not necessarily UB, that's just a program behaving > > > > > > > > incorrectly. > > > > > > > > > > > > > > > > unless there's a specific C/C++ rule that you have in mind that > > > > > > > > I'm missing? > > > > > > > > > > > > > > > > I'm happy to add a warning that we can turn off internally if > > > > > > > > people agree that taking the address of an `unnamed_addr` > > > > > > > > global is dangerous in general, not sure if that should be > > > > > > > > default on or off > > > > > > > The simplest one I can think of is something like a : > > > > > > > > > > > > > > ```swap_containers(global1, global2); //swap that takes by ref, > > > > > > > but doesn't check addresses``` > > > > > > > > > > > > > > Warnings are obviously trivially disable-able, and I'd be fine > > > > > > > with making it a really fine-grained warning group, but I think > > > > > > > the dangers of this attribute are significant. MANY operations > > > > > > > can have a precondition of "my parameters are different objects" > > > > > > > that this can now cause you to trivially violate in a way that > > > > > > > was never a problem before (since we give them different names!). > > > > > > I think, to Aaron's earlier question, warning on taking the address > > > > > > of such a global kind of defeats the point of the attribute. If you > > > > > > *don't* take the address of the constant global, LLVM will already > > > > > > mark it with `unnamed_addr` or `local_unnamed_addr`, and the > > > > > > attribute is usually unnecessary. You would only reach for this > > > > > > attribute if you are already taking the address of the global. > > > > > > > > > > > > And, to try to elaborate more on the UB situation, I believe LLVM > > > > > > does perform optimizations like `&gv1 == &gv2` -> `false`: > > > > > > https://gcc.godbolt.org/z/3f5qd8vKr > > > > > > This does present a soundness issue, but it's already a bit of an > > > > > > existing issue, since there are other ways to make globals alias, > > > > > > such as aliases. > > > > > ah I misunderstood how ICF worked in regards to `local_unnamed_addr`. > > > > > right now we will ICF globals that don't have their address taken > > > > > across any translation unit. > > > > > > > > > > ``` > > > > > $ cat /tmp/a.c > > > > > extern const int h; > > > > > extern const int g; > > > > > const int h = 42; > > > > > const int g = 42; > > > > > > > > > > int f() { > > > > > return h; > > > > > } > > > > > > > > > > $ cat /tmp/b.c > > > > > extern const int g; > > > > > > > > > > int f(); > > > > > > > > > > int main() { > > > > > return g + f(); > > > > > } > > > > > > > > > > $ clang -o /tmp/a /tmp/a.c /tmp/b.c -O2 -fdata-sections > > > > > -ffunction-sections -fuse-ld=lld -Wl,--icf=safe > > > > > -Wl,--print-icf-sections -Wl,--verbose > > > > > ... > > > > > ld.lld: ICF needed 2 iterations > > > > > selected section > > > > > /usr/local/google/home/aeubanks/tmp/a-da430f.o:(.rodata.h) > > > > > removing identical section > > > > > /usr/local/google/home/aeubanks/tmp/a-da430f.o:(.rodata.g) > > > > > ... > > > > > ``` > > > > > > > > > > so yeah +1 to @rnk's comment. this change would only benefit globals > > > > > that have their address taken. > > > > > I think, to Aaron's earlier question, warning on taking the address > > > > > of such a global kind of defeats the point of the attribute. If you > > > > > *don't* take the address of the constant global, LLVM will already > > > > > mark it with unnamed_addr or local_unnamed_addr, and the attribute is > > > > > usually unnecessary. You would only reach for this attribute if you > > > > > are already taking the address of the global. > > > > > > > > Thanks for the information! That somehow makes me even *less* > > > > comfortable with the attribute because the only times you'd reach for > > > > it are precisely the times when it can bite you. Having some sort of > > > > static analysis check paired with it to help give it some guard rails > > > > for problematic uses would make me more comfortable. However, we > > > > certainly have other attributes that are "use at your own peril" > > > > without such guard rails, so I don't insist (but I still strongly > > > > encourage), but then I would insist on more comprehensive documentation > > > > calling out what misuse looks like and what potential problems come > > > > from misuse. > > > > > > > > > perhaps I'm arguing too literally (or am just wrong), but even if the > > > > > program behaves incorrectly due to comparisons of addresses now > > > > > giving different results, that's not necessarily UB. even if a > > > > > program were using pointers as a key into a map and two globals that > > > > > previously were separate entries are now one, that's not necessarily > > > > > UB, that's just a program behaving incorrectly. > > > > > > > > The reason why I believe it is UB is because everything else in the > > > > language model believes object addresses are distinct. So there's no > > > > way to determine what the impact is of two objects whose addresses > > > > which should be distinct but aren't, when that's observed by the > > > > program. It's undefined because we have no definition of what happens > > > > in this case. The program could be incorrect and just behave poorly, > > > > crash, cause data loss, etc. (As a concrete example, the `restrict` > > > > keyword effectively boils down to reasoning about whether two pointers > > > > overlap; if you pass the address of two distinct objects, they won't > > > > overlap but if they've been merged with this attribute, now they will > > > > overlap and the `restrict` requirements are broken.) > > > > > > > > Regardless, I wasn't trying to get us hung up on UB as a standards term > > > > of art. I am worried about the user experience with the attribute. > > > > Optimization attributes that silently break program correctness do not > > > > provide a particularly good user experience and so if there's a > > > > practical way we can help users out without negatively impacting > > > > correct uses of the attribute, I think we should implement it. C and > > > > C++ already get a bad rap for safety and security because tooling is > > > > unhelpful, so I think we need to do better than we've done in the past > > > > in terms of diagnostic behavior. > > > So, I think we disagree on the substance of the issue here, and perhaps > > > we should try to get more opinions on Discourse. The feature request > > > really is to have an optimization attribute to mark global constant data > > > as mergeable by the linker, and for the user to promise that the address > > > of the global is not significant (named). > > I'll post something to discourse > Sure, getting more feedback sounds reasonable to me. I should be clear -- I > think I'm raising the bar on this attribute compared to previous optimization > attributes, and that might be where some of this friction comes from. We have > plenty of other optimization-related attributes that are "use at your own > risk" without safety rails. However, I don't think that precedent is > reasonable any longer in the face of growing resistance to using C and C++ > for new projects specifically because tooling doesn't help catch safety and > security related concerns, which are often the result of optimizer > assumptions on incorrect code. So I'm asking for (what I think are) > reasonable diagnostics to help people catch misuse, not trying to ask for > heroics. I'm not seeing great binary size savings so I'm putting this off until somebody internally can give me a case where we can measure good binary size savings Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D158223/new/ https://reviews.llvm.org/D158223 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits