dblaikie added inline comments.

================
Comment at: clang/include/clang/AST/PrettyPrinter.h:307
+  /// decltype(s) will be printed as "S<Point{1,2}>" if enabled and as 
"S<{1,2}>" if disabled,
+  /// regardless if PrintCanonicalTypes is enabled.
+  unsigned AlwaysIncludeTypeForNonTypeTemplateArgument : 1;
----------------
DoDoENT wrote:
> dblaikie wrote:
> > DoDoENT wrote:
> > > DoDoENT wrote:
> > > > dblaikie wrote:
> > > > > aaron.ballman wrote:
> > > > > > DoDoENT wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > dblaikie wrote:
> > > > > > > > > DoDoENT wrote:
> > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > What does `PrintCanonicalTypes` have to do with this? 
> > > > > > > > > > > Does it overlap with this functionality in some way, but 
> > > > > > > > > > > doesn't provide the functionality you want in particular?
> > > > > > > > > > Thank you for the question. If you set the 
> > > > > > > > > > `PrintCanonicalTypes` to `false`, the `S<Point{1, 2}>` 
> > > > > > > > > > would be printed as `S<Point{1, 2}>` even without this 
> > > > > > > > > > patch. However, if you set it to `true`, it will be printed 
> > > > > > > > > > as `S<{1, 2}>`.
> > > > > > > > > > 
> > > > > > > > > > I don't fully understand why it does that, but it's quite 
> > > > > > > > > > annoying.
> > > > > > > > > > 
> > > > > > > > > > For a better example, please take a look at the 
> > > > > > > > > > `TemplateIdWithComplexFullTypeNTTP` unit tests that I've 
> > > > > > > > > > added: if `PrintCanonicalTypes` is set to `true`, the 
> > > > > > > > > > original print output of type is `NDArray<float, {{{0}}}, 
> > > > > > > > > > {{{0}}}, {{{0}}}>`, and if set to `false` (which is 
> > > > > > > > > > default), the output is `NDArray<float, Height{{{0}}}, 
> > > > > > > > > > Width{{{0}}}, Channels{{{0}}}>` - so the NTTP type is 
> > > > > > > > > > neither fully written nor fully omitted, which is weird.
> > > > > > > > > > 
> > > > > > > > > > As I said, I don't really understand the idea behind 
> > > > > > > > > > `PrintCanonicalTypes`, but when my new 
> > > > > > > > > > `AlwaysIncludeTypeForNonTypeTemplateArgument` is enabled, 
> > > > > > > > > > you will get the full type printed, regardless of value of 
> > > > > > > > > > `PrintCanonicalTypes` setting.
> > > > > > > > > > 
> > > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes 
> > > > > > > > > than something to add a separate flag for.
> > > > > > > > > 
> > > > > > > > > @aaron.ballman D55552 for context here... 
> > > > > > > > > 
> > > > > > > > > Hmm, actually, just adding the top level `Height{{0}}, 
> > > > > > > > > Width{{0}}, Channels{{0}}` is sufficient to make this code 
> > > > > > > > > compile (whereas with the `{{{0}}}` it doesn't form a valid 
> > > > > > > > > identifier.
> > > > > > > > > 
> > > > > > > > > So what's your use case for needing more explicitness than 
> > > > > > > > > that top level? 
> > > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes 
> > > > > > > > > than something to add a separate flag for.
> > > > > > > > >
> > > > > > > > > @aaron.ballman D55552 for context here...
> > > > > > > > 
> > > > > > > > I looked over D55552 again and haven't spotted anything with it 
> > > > > > > > that seems amiss; the change there is to grab the canonical 
> > > > > > > > type before trying to print it which is all the more I'd expect 
> > > > > > > > `PrintCanonicalTypes` to impact.
> > > > > > > > 
> > > > > > > > This looks like the behavior you'd get when you desugar the 
> > > > > > > > type. Check out the AST dump for `s`: 
> > > > > > > > https://godbolt.org/z/vxh5j6qWr
> > > > > > > > ```
> > > > > > > > `-VarDecl <line:3:1, col:20> col:20 s 'S<Point{1, 2}>':'S<{1, 
> > > > > > > > 2}>' callinit
> > > > > > > > ```
> > > > > > > > We generate that type information at 
> > > > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/TextNodeDumper.cpp#L663
> > > > > > > >  for doing the AST dump, note how the second type printed is 
> > > > > > > > the desugared type and that matches what we're seeing from the 
> > > > > > > > pretty printer.
> > > > > > > > So what's your use case for needing more explicitness than that 
> > > > > > > > top level?
> > > > > > > 
> > > > > > > As I described in the [github 
> > > > > > > issue](https://github.com/llvm/llvm-project/issues/57562), I'm 
> > > > > > > trying to write a clang-based tool that will have different 
> > > > > > > behavior if the printed `{{{0}}}` is actually `Width` than if its 
> > > > > > > `Height` or anything else.
> > > > > > > 
> > > > > > > You can see the the issue in the AST dump for `bla`: 
> > > > > > > https://godbolt.org/z/fMr4f13o3
> > > > > > > 
> > > > > > > The type is
> > > > > > > ```
> > > > > > > `-VarDecl <line:20:1, col:21> col:21 bla 'NDArray<float, 
> > > > > > > W>':'NDArray<float, {{{0}}}>' callinit
> > > > > > >   `-CXXConstructExpr <col:21> 'NDArray<float, W>':'NDArray<float, 
> > > > > > > {{{0}}}>' 'void () noexcept'
> > > > > > > ```
> > > > > > > 
> > > > > > > so it's unknown whether `{{{0}}}` represent the `Width` or 
> > > > > > > `Height`. My patch makes it work exactly like GCC (see the 
> > > > > > > comparison of error message between [clang 15 and GCC 
> > > > > > > 12.1](https://godbolt.org/z/WenWe8caf).
> > > > > > > 
> > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes than 
> > > > > > > > something to add a separate flag for.
> > > > > > > 
> > > > > > > This was also my first thought and the first version of my patch 
> > > > > > > (before even submitting it here) was to actually change the 
> > > > > > > behavior of `PrintCanonicalTypes`. However, that change made some 
> > > > > > > tests fail, as I described in the patch description:
> > > > > > > 
> > > > > > > - CodeGenCXX/debug-info-template.cpp
> > > > > > > - SemaCXX/constexpr-printing.cpp
> > > > > > > - SemaCXX/cxx2a-nttp-printing.cpp
> > > > > > > - SemaTemplate/temp_arg_string_printing.cpp
> > > > > > > 
> > > > > > > Of course, it's possible to simply update the tests, but I 
> > > > > > > actually don't fully understand what is the goal of 
> > > > > > > `PrintCanonicalTypes` and whether its current behavior is 
> > > > > > > actually desired or not, so I played it safe and introduced a new 
> > > > > > > policy that is disabled by default until I get more feedback from 
> > > > > > > more experienced LLVM developers.
> > > > > > > 
> > > > > > > The patch does solve my problem and since I'm building LLVM from 
> > > > > > > source anyway, I can have it enabled in my fork.
> > > > > > > 
> > > > > > > I just want to see if it would also be beneficial to be 
> > > > > > > introduced into the upstream LLVM.
> > > > > > > Of course, it's possible to simply update the tests, but I 
> > > > > > > actually don't fully understand what is the goal of 
> > > > > > > PrintCanonicalTypes and whether its current behavior is actually 
> > > > > > > desired or not, so I played it safe and introduced a new policy 
> > > > > > > that is disabled by default until I get more feedback from more 
> > > > > > > experienced LLVM developers.
> > > > > > 
> > > > > > `PrintCanonicalTypes` is what controls output like whether we print 
> > > > > > a typedef name (not the canonical type) or the final underlying 
> > > > > > type all typedefs involved (the canonical type). It won't have an 
> > > > > > impact on things like whether we print the name of a structure or 
> > > > > > not.
> > > > > > 
> > > > > > After looking into this, I think we do want changes here. I'm not 
> > > > > > 100% convinced we need a new policy member. I tried out printing 
> > > > > > the type names unconditionally and the results were a bit of a 
> > > > > > mixed bag (some diagnostics got more clear, other diagnostics 
> > > > > > didn't become more clear but also didn't become more confusing):
> > > > > > ```
> > > > > > error: 'note' diagnostics expected but not seen:
> > > > > >   File 
> > > > > > F:\source\llvm-project\clang\test\SemaTemplate\temp_arg_nontype_cxx20.cpp
> > > > > >  Line 189: in instantiation of template class 'Diags::X<{1, 2}>' 
> > > > > > requested here
> > > > > > error: 'note' diagnostics seen but not expected:
> > > > > >   File 
> > > > > > F:\source\llvm-project\clang\test\SemaTemplate\temp_arg_nontype_cxx20.cpp
> > > > > >  Line 189: in instantiation of template class 'Diags::X<Diags::A{1, 
> > > > > > 2}>' requested here
> > > > > > ```
> > > > > > seems like a clarifying change, while:
> > > > > > ```
> > > > > > error: 'error' diagnostics expected but not seen:
> > > > > >   File 
> > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > Line 19: no member named 'display' in 'ASCII<{"this nontype 
> > > > > > template argument is [...]"}>'
> > > > > >   File 
> > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > Line 25: no member named 'display' in 'ASCII<{{119, 97, 105, 116, 
> > > > > > 32, 97, 32, 115, 27, 99, ...}}>'
> > > > > >   File 
> > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > Line 33: no member named 'display' in 'ASCII<{"what??!"}>'
> > > > > > error: 'error' diagnostics seen but not expected:
> > > > > >   File 
> > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > Line 19: no member named 'display' in 'ASCII<Str<43>{"this nontype 
> > > > > > template argument is [...]"}>'
> > > > > >   File 
> > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > Line 25: no member named 'display' in 'ASCII<Str<14>{{119, 97, 105, 
> > > > > > 116, 32, 97, 32, 115, 27, 99, ...}}>'
> > > > > >   File 
> > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > Line 33: no member named 'display' in 'ASCII<Str<8>{"what??!"}>'
> > > > > > ```
> > > > > > seems like it's neither here nor there.
> > > > > > 
> > > > > > @dblaikie, do you have feelings on how to go with this?
> > > > > Yeah, I'm inclined to think that `Diags::X<{1, 2}>` is just too 
> > > > > simplified - it's unambiguous if the parameter isn't `auto`, but 
> > > > > isn't valid syntax (so the language still expects a type name there) 
> > > > > & so maybe we should do the same in diagnostics?
> > > > > 
> > > > > @aaron.ballman - though I'm still confused by the behavior-change 
> > > > > when `PrintCanonicalTypes = false` that causes the top level names to 
> > > > > be printed? Maybe that's just a weird case/red herring/bug in 
> > > > > `PrintCanonicalTypes = true` that skips those top level names and we 
> > > > > could print them unconditionally?
> > > > > 
> > > > > @DoDoENT - I was/am curious if/why you need more explicitness than 
> > > > > would be provided by `PrintCanonicalTypes = false` - if the output 
> > > > > was `NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}>` 
> > > > > would that be sufficient for your needs? (I think in that case the 
> > > > > name would be valid to use in code, which I think is a reasonable 
> > > > > basis to make the decision - but I'm not sure how to justify adding 
> > > > > all the intermediate names too)
> > > > > 
> > > > > `PrintCanonicalTypes` is what controls output like whether we print a 
> > > > > typedef name (not the canonical type) or the final underlying type 
> > > > > all typedefs involved (the canonical type). It won't have an impact 
> > > > > on things like whether we print the name of a structure or not.
> > > > 
> > > > Thank you for the explanation.
> > > > 
> > > > > I was/am curious if/why you need more explicitness than would be 
> > > > > provided by `PrintCanonicalTypes = false` - if the output was 
> > > > > `NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}>` would 
> > > > > that be sufficient for your needs? 
> > > > 
> > > > It would for now (at least until I actually start needing to get the 
> > > > types of inner braces), however, in a real-world scenario I actually 
> > > > get output `NDArray<float, {{{0}}}, {{{0}}}, {{{0}}}>`, regardless of 
> > > > the value of `PrintCanonicalTypes`. I've tried to create a minimum 
> > > > reproducer for that case, but, unfortunately, all simple cases resolve 
> > > > to actually display the names `Height`, `Width`, etc. The closes thing 
> > > > to the real-world scenario is the diagnostic message shown 
> > > > [here](https://godbolt.org/z/WenWe8caf).
> > > > 
> > > > However, in my real code, where there are multiple levels of type 
> > > > deduction happening, I always get the output without a type name, 
> > > > regardless of the value of `PrintCanonicalTypes`. This is why I took 
> > > > some time to debug what is actually happening within clang and created 
> > > > this patch which now works for me in all cases.
> > > > Maybe that's just a weird case/red herring/bug in `PrintCanonicalTypes 
> > > > = true` that skips those top level names and we could print them 
> > > > unconditionally?
> > > 
> > > From my debugging sessions, I saw that the difference happens during 
> > > printing whether the type in question is deduced type or known type. If 
> > > it's a known type, then in both cases the type gets printed (in a 
> > > clang-based tool, not in the diagnostic as shown on Godbolt). However, if 
> > > it's a deduced type, then the type gets or doesn't get printed depending 
> > > on whether the canonical type is requested. If canonical type is 
> > > requested from a deduced type, in AST it's actually represented as an 
> > > "expression statement", not a "type", and then the type printer tries to 
> > > print it as an "expression statement". This then leads to printing code 
> > > in APValue and TemplateBase thinking that it's actually printing a normal 
> > > C++ expression (as in the case of a clang-tidy or similar refactor tool), 
> > > and not as a non-type template parameter, yielding printout with omitted 
> > > types, as that is generally expected when writing code. 
> > > 
> > > However, if you are using the printer to get a full type name in a string 
> > > for different purposes, as I do, then you get incomplete information. So, 
> > > there may be cases when you actually need/want full-type info and cases 
> > > when you don't want it. Therefore, I think some new policy config should 
> > > be in place to let people choose the behavior.
> > > It would for now (at least until I actually start needing to get the 
> > > types of inner braces), 
> > 
> > This ^ is what I was getting at - do you have a use/need for the explicit 
> > nested type names? Given that those aren't ambiguous/the syntax is valid 
> > C++ - what's the use case/need for them to be explicitly written?
> > 
> > Because that presents some tension between the needs of diagnostic users 
> > (where I think adding this extra info that can be inferred from context (by 
> > both the user and the C++ language/compiler) and would potentially make 
> > diagnostics noisier/less helpful) and your use case (which I'm trying to 
> > understand).
> > 
> > If there are multiple levels of deduction/ambiguity, I'm all for having 
> > more explicit names as necessary to remove the ambiguity, so maybe small 
> > examples of that would be helpful too, to make sure the solution is as deep 
> > as it needs to be.
> > 
> > > however, in a real-world scenario I actually get output `NDArray<float, 
> > > {{{0}}}, {{{0}}}, {{{0}}}>`
> > 
> > Right, I'm happy to/leaning towards/trying to understand if we can consider 
> > that a bug in whatever way it shows up.
> > 
> > Hmm, seems this might be related to the FIXME here: 
> > https://github.com/llvm/llvm-project/blob/e07ead85a368173a56e96a21d6841aa497ad80f8/clang/lib/AST/TemplateBase.cpp#L424
> >  - which, right, is the code you're changing in this patch - but I think 
> > the FIXME hints at this not being something we necessarily want to have as 
> > opt-in, but as something that should be fixed in this code possibly without 
> > the need for a new printing policy.
> > 
> > Seems reasonable that this produces at least unambiguous names (so it 
> > should print out the name if it's in a deduced context) and I think it's 
> > probably good if it prints out valid names (so even includes the name in 
> > non-deduced contexts, because C++ requires that too when writing the name) 
> > to simplify things for users of diagnostics, and in your case.
> >  Given that those aren't ambiguous/the syntax is valid C++ - what's the use 
> > case/need for them to be explicitly written?
> 
> I'm writing a clang-based tool for internal use in our company that will need 
> to parse some c++ code from a machine-generated function, representing the 
> execution of computation graph (like those used in machine learning 
> inference), then extract the exact types of all intermediate results and feed 
> that into an algorithm that will optimize and possibly reorder the execution 
> of graph nodes, depending on the types of the intermediate results (after 
> optimization the types may change, but we know exactly how). Since we are 
> encoding a lot of compile-time information into the type names of the 
> intermediates, we need full type names here. As I'm a rookie with LLVM and 
> don't know better, I implemented the FrontendAction and RecursiveVisitor to 
> visit variables representing the variables I'm interested in and used 
> `getAsString` to obtain type names. Unfortunately, I was not getting full 
> type names and therefore I investigated further and that resulted with this 
> patch.
> 
> > Because that presents some tension between the needs of diagnostic users 
> > (where I think adding this extra info that can be inferred from context (by 
> > both the user and the C++ language/compiler) and would potentially make 
> > diagnostics noisier/less helpful) and your use case (which I'm trying to 
> > understand).
> 
> In that case, it makes sense to introduce a new policy. Maybe only a better 
> name for it should be invented...
> 
> > Hmm, seems this might be related to the FIXME here: 
> > https://github.com/llvm/llvm-project/blob/e07ead85a368173a56e96a21d6841aa497ad80f8/clang/lib/AST/TemplateBase.cpp#L424
> >  - which, right, is the code you're changing in this patch - but I think 
> > the FIXME hints at this not being something we necessarily want to have as 
> > opt-in, but as something that should be fixed in this code possibly without 
> > the need for a new printing policy.
> > 
> > Seems reasonable that this produces at least unambiguous names (so it 
> > should print out the name if it's in a deduced context) and I think it's 
> > probably good if it prints out valid names (so even includes the name in 
> > non-deduced contexts, because C++ requires that too when writing the name) 
> > to simplify things for users of diagnostics, and in your case.
> 
> If only a change in `TemplateBase.cpp` is applied, but not also in 
> `APValue.cpp`, then at least I get consistently printed out `Height{{{0}}}`, 
> regardless of the value of `PrintCanonicalTypes`, so change in this line may 
> indeed be a bugfix.
> 
> However, a change in `APValue.cpp` indeed adds noise, as you call it, to the 
> diagnostic messages, as it ensures that the full type name is written. 
> However, the same noise level is always generated by GCC and, in my opinion, 
> although noisier, the GCC's error message is more thorough. But I'm OK with 
> keeping that change behind a new policy.
> > Given that those aren't ambiguous/the syntax is valid C++ - what's the use 
> > case/need for them to be explicitly written?
> 
> I'm writing a clang-based tool for internal use in our company that will need 
> to parse some c++ code from a machine-generated function, representing the 
> execution of computation graph (like those used in machine learning 
> inference), then extract the exact types of all intermediate results and feed 
> that into an algorithm that will optimize and possibly reorder the execution 
> of graph nodes, depending on the types of the intermediate results (after 
> optimization the types may change, but we know exactly how). Since we are 
> encoding a lot of compile-time information into the type names of the 
> intermediates, we need full type names here. As I'm a rookie with LLVM and 
> don't know better, I implemented the FrontendAction and RecursiveVisitor to 
> visit variables representing the variables I'm interested in and used 
> getAsString to obtain type names. Unfortunately, I was not getting full type 
> names and therefore I investigated further and that resulted with this patch.

But if the type as a string is correct as far as C++ is concerned, is that not 
enough information (necessarily - the C++ compiler can determine all the type 
information from the type as written with only the top level types in the NTTP 
names)? If it's that the tool is trying to get the type information purely from 
the string representation, maybe the tool should be doing things differently - 
relying on Clang's AST/type APIs, rather than on inspecting the resulting 
string?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134453/new/

https://reviews.llvm.org/D134453

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to