DoDoENT added inline comments.

================
Comment at: clang/include/clang/AST/PrettyPrinter.h:307
+  /// decltype(s) will be printed as "S<Point{1,2}>" if enabled and as 
"S<{1,2}>" if disabled,
+  /// regardless if PrintCanonicalTypes is enabled.
+  unsigned AlwaysIncludeTypeForNonTypeTemplateArgument : 1;
----------------
dblaikie wrote:
> DoDoENT wrote:
> > dblaikie wrote:
> > > DoDoENT wrote:
> > > > DoDoENT wrote:
> > > > > dblaikie wrote:
> > > > > > aaron.ballman wrote:
> > > > > > > DoDoENT wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > dblaikie wrote:
> > > > > > > > > > DoDoENT wrote:
> > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > What does `PrintCanonicalTypes` have to do with this? 
> > > > > > > > > > > > Does it overlap with this functionality in some way, 
> > > > > > > > > > > > but doesn't provide the functionality you want in 
> > > > > > > > > > > > particular?
> > > > > > > > > > > Thank you for the question. If you set the 
> > > > > > > > > > > `PrintCanonicalTypes` to `false`, the `S<Point{1, 2}>` 
> > > > > > > > > > > would be printed as `S<Point{1, 2}>` even without this 
> > > > > > > > > > > patch. However, if you set it to `true`, it will be 
> > > > > > > > > > > printed as `S<{1, 2}>`.
> > > > > > > > > > > 
> > > > > > > > > > > I don't fully understand why it does that, but it's quite 
> > > > > > > > > > > annoying.
> > > > > > > > > > > 
> > > > > > > > > > > For a better example, please take a look at the 
> > > > > > > > > > > `TemplateIdWithComplexFullTypeNTTP` unit tests that I've 
> > > > > > > > > > > added: if `PrintCanonicalTypes` is set to `true`, the 
> > > > > > > > > > > original print output of type is `NDArray<float, {{{0}}}, 
> > > > > > > > > > > {{{0}}}, {{{0}}}>`, and if set to `false` (which is 
> > > > > > > > > > > default), the output is `NDArray<float, Height{{{0}}}, 
> > > > > > > > > > > Width{{{0}}}, Channels{{{0}}}>` - so the NTTP type is 
> > > > > > > > > > > neither fully written nor fully omitted, which is weird.
> > > > > > > > > > > 
> > > > > > > > > > > As I said, I don't really understand the idea behind 
> > > > > > > > > > > `PrintCanonicalTypes`, but when my new 
> > > > > > > > > > > `AlwaysIncludeTypeForNonTypeTemplateArgument` is enabled, 
> > > > > > > > > > > you will get the full type printed, regardless of value 
> > > > > > > > > > > of `PrintCanonicalTypes` setting.
> > > > > > > > > > > 
> > > > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes 
> > > > > > > > > > than something to add a separate flag for.
> > > > > > > > > > 
> > > > > > > > > > @aaron.ballman D55552 for context here... 
> > > > > > > > > > 
> > > > > > > > > > Hmm, actually, just adding the top level `Height{{0}}, 
> > > > > > > > > > Width{{0}}, Channels{{0}}` is sufficient to make this code 
> > > > > > > > > > compile (whereas with the `{{{0}}}` it doesn't form a valid 
> > > > > > > > > > identifier.
> > > > > > > > > > 
> > > > > > > > > > So what's your use case for needing more explicitness than 
> > > > > > > > > > that top level? 
> > > > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes 
> > > > > > > > > > than something to add a separate flag for.
> > > > > > > > > >
> > > > > > > > > > @aaron.ballman D55552 for context here...
> > > > > > > > > 
> > > > > > > > > I looked over D55552 again and haven't spotted anything with 
> > > > > > > > > it that seems amiss; the change there is to grab the 
> > > > > > > > > canonical type before trying to print it which is all the 
> > > > > > > > > more I'd expect `PrintCanonicalTypes` to impact.
> > > > > > > > > 
> > > > > > > > > This looks like the behavior you'd get when you desugar the 
> > > > > > > > > type. Check out the AST dump for `s`: 
> > > > > > > > > https://godbolt.org/z/vxh5j6qWr
> > > > > > > > > ```
> > > > > > > > > `-VarDecl <line:3:1, col:20> col:20 s 'S<Point{1, 2}>':'S<{1, 
> > > > > > > > > 2}>' callinit
> > > > > > > > > ```
> > > > > > > > > We generate that type information at 
> > > > > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/TextNodeDumper.cpp#L663
> > > > > > > > >  for doing the AST dump, note how the second type printed is 
> > > > > > > > > the desugared type and that matches what we're seeing from 
> > > > > > > > > the pretty printer.
> > > > > > > > > So what's your use case for needing more explicitness than 
> > > > > > > > > that top level?
> > > > > > > > 
> > > > > > > > As I described in the [github 
> > > > > > > > issue](https://github.com/llvm/llvm-project/issues/57562), I'm 
> > > > > > > > trying to write a clang-based tool that will have different 
> > > > > > > > behavior if the printed `{{{0}}}` is actually `Width` than if 
> > > > > > > > its `Height` or anything else.
> > > > > > > > 
> > > > > > > > You can see the the issue in the AST dump for `bla`: 
> > > > > > > > https://godbolt.org/z/fMr4f13o3
> > > > > > > > 
> > > > > > > > The type is
> > > > > > > > ```
> > > > > > > > `-VarDecl <line:20:1, col:21> col:21 bla 'NDArray<float, 
> > > > > > > > W>':'NDArray<float, {{{0}}}>' callinit
> > > > > > > >   `-CXXConstructExpr <col:21> 'NDArray<float, 
> > > > > > > > W>':'NDArray<float, {{{0}}}>' 'void () noexcept'
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > so it's unknown whether `{{{0}}}` represent the `Width` or 
> > > > > > > > `Height`. My patch makes it work exactly like GCC (see the 
> > > > > > > > comparison of error message between [clang 15 and GCC 
> > > > > > > > 12.1](https://godbolt.org/z/WenWe8caf).
> > > > > > > > 
> > > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes 
> > > > > > > > > than something to add a separate flag for.
> > > > > > > > 
> > > > > > > > This was also my first thought and the first version of my 
> > > > > > > > patch (before even submitting it here) was to actually change 
> > > > > > > > the behavior of `PrintCanonicalTypes`. However, that change 
> > > > > > > > made some tests fail, as I described in the patch description:
> > > > > > > > 
> > > > > > > > - CodeGenCXX/debug-info-template.cpp
> > > > > > > > - SemaCXX/constexpr-printing.cpp
> > > > > > > > - SemaCXX/cxx2a-nttp-printing.cpp
> > > > > > > > - SemaTemplate/temp_arg_string_printing.cpp
> > > > > > > > 
> > > > > > > > Of course, it's possible to simply update the tests, but I 
> > > > > > > > actually don't fully understand what is the goal of 
> > > > > > > > `PrintCanonicalTypes` and whether its current behavior is 
> > > > > > > > actually desired or not, so I played it safe and introduced a 
> > > > > > > > new policy that is disabled by default until I get more 
> > > > > > > > feedback from more experienced LLVM developers.
> > > > > > > > 
> > > > > > > > The patch does solve my problem and since I'm building LLVM 
> > > > > > > > from source anyway, I can have it enabled in my fork.
> > > > > > > > 
> > > > > > > > I just want to see if it would also be beneficial to be 
> > > > > > > > introduced into the upstream LLVM.
> > > > > > > > Of course, it's possible to simply update the tests, but I 
> > > > > > > > actually don't fully understand what is the goal of 
> > > > > > > > PrintCanonicalTypes and whether its current behavior is 
> > > > > > > > actually desired or not, so I played it safe and introduced a 
> > > > > > > > new policy that is disabled by default until I get more 
> > > > > > > > feedback from more experienced LLVM developers.
> > > > > > > 
> > > > > > > `PrintCanonicalTypes` is what controls output like whether we 
> > > > > > > print a typedef name (not the canonical type) or the final 
> > > > > > > underlying type all typedefs involved (the canonical type). It 
> > > > > > > won't have an impact on things like whether we print the name of 
> > > > > > > a structure or not.
> > > > > > > 
> > > > > > > After looking into this, I think we do want changes here. I'm not 
> > > > > > > 100% convinced we need a new policy member. I tried out printing 
> > > > > > > the type names unconditionally and the results were a bit of a 
> > > > > > > mixed bag (some diagnostics got more clear, other diagnostics 
> > > > > > > didn't become more clear but also didn't become more confusing):
> > > > > > > ```
> > > > > > > error: 'note' diagnostics expected but not seen:
> > > > > > >   File 
> > > > > > > F:\source\llvm-project\clang\test\SemaTemplate\temp_arg_nontype_cxx20.cpp
> > > > > > >  Line 189: in instantiation of template class 'Diags::X<{1, 2}>' 
> > > > > > > requested here
> > > > > > > error: 'note' diagnostics seen but not expected:
> > > > > > >   File 
> > > > > > > F:\source\llvm-project\clang\test\SemaTemplate\temp_arg_nontype_cxx20.cpp
> > > > > > >  Line 189: in instantiation of template class 
> > > > > > > 'Diags::X<Diags::A{1, 2}>' requested here
> > > > > > > ```
> > > > > > > seems like a clarifying change, while:
> > > > > > > ```
> > > > > > > error: 'error' diagnostics expected but not seen:
> > > > > > >   File 
> > > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > > Line 19: no member named 'display' in 'ASCII<{"this nontype 
> > > > > > > template argument is [...]"}>'
> > > > > > >   File 
> > > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > > Line 25: no member named 'display' in 'ASCII<{{119, 97, 105, 116, 
> > > > > > > 32, 97, 32, 115, 27, 99, ...}}>'
> > > > > > >   File 
> > > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > > Line 33: no member named 'display' in 'ASCII<{"what??!"}>'
> > > > > > > error: 'error' diagnostics seen but not expected:
> > > > > > >   File 
> > > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > > Line 19: no member named 'display' in 'ASCII<Str<43>{"this 
> > > > > > > nontype template argument is [...]"}>'
> > > > > > >   File 
> > > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > > Line 25: no member named 'display' in 'ASCII<Str<14>{{119, 97, 
> > > > > > > 105, 116, 32, 97, 32, 115, 27, 99, ...}}>'
> > > > > > >   File 
> > > > > > > F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp 
> > > > > > > Line 33: no member named 'display' in 'ASCII<Str<8>{"what??!"}>'
> > > > > > > ```
> > > > > > > seems like it's neither here nor there.
> > > > > > > 
> > > > > > > @dblaikie, do you have feelings on how to go with this?
> > > > > > Yeah, I'm inclined to think that `Diags::X<{1, 2}>` is just too 
> > > > > > simplified - it's unambiguous if the parameter isn't `auto`, but 
> > > > > > isn't valid syntax (so the language still expects a type name 
> > > > > > there) & so maybe we should do the same in diagnostics?
> > > > > > 
> > > > > > @aaron.ballman - though I'm still confused by the behavior-change 
> > > > > > when `PrintCanonicalTypes = false` that causes the top level names 
> > > > > > to be printed? Maybe that's just a weird case/red herring/bug in 
> > > > > > `PrintCanonicalTypes = true` that skips those top level names and 
> > > > > > we could print them unconditionally?
> > > > > > 
> > > > > > @DoDoENT - I was/am curious if/why you need more explicitness than 
> > > > > > would be provided by `PrintCanonicalTypes = false` - if the output 
> > > > > > was `NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}>` 
> > > > > > would that be sufficient for your needs? (I think in that case the 
> > > > > > name would be valid to use in code, which I think is a reasonable 
> > > > > > basis to make the decision - but I'm not sure how to justify adding 
> > > > > > all the intermediate names too)
> > > > > > 
> > > > > > `PrintCanonicalTypes` is what controls output like whether we print 
> > > > > > a typedef name (not the canonical type) or the final underlying 
> > > > > > type all typedefs involved (the canonical type). It won't have an 
> > > > > > impact on things like whether we print the name of a structure or 
> > > > > > not.
> > > > > 
> > > > > Thank you for the explanation.
> > > > > 
> > > > > > I was/am curious if/why you need more explicitness than would be 
> > > > > > provided by `PrintCanonicalTypes = false` - if the output was 
> > > > > > `NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}>` 
> > > > > > would that be sufficient for your needs? 
> > > > > 
> > > > > It would for now (at least until I actually start needing to get the 
> > > > > types of inner braces), however, in a real-world scenario I actually 
> > > > > get output `NDArray<float, {{{0}}}, {{{0}}}, {{{0}}}>`, regardless of 
> > > > > the value of `PrintCanonicalTypes`. I've tried to create a minimum 
> > > > > reproducer for that case, but, unfortunately, all simple cases 
> > > > > resolve to actually display the names `Height`, `Width`, etc. The 
> > > > > closes thing to the real-world scenario is the diagnostic message 
> > > > > shown [here](https://godbolt.org/z/WenWe8caf).
> > > > > 
> > > > > However, in my real code, where there are multiple levels of type 
> > > > > deduction happening, I always get the output without a type name, 
> > > > > regardless of the value of `PrintCanonicalTypes`. This is why I took 
> > > > > some time to debug what is actually happening within clang and 
> > > > > created this patch which now works for me in all cases.
> > > > > Maybe that's just a weird case/red herring/bug in 
> > > > > `PrintCanonicalTypes = true` that skips those top level names and we 
> > > > > could print them unconditionally?
> > > > 
> > > > From my debugging sessions, I saw that the difference happens during 
> > > > printing whether the type in question is deduced type or known type. If 
> > > > it's a known type, then in both cases the type gets printed (in a 
> > > > clang-based tool, not in the diagnostic as shown on Godbolt). However, 
> > > > if it's a deduced type, then the type gets or doesn't get printed 
> > > > depending on whether the canonical type is requested. If canonical type 
> > > > is requested from a deduced type, in AST it's actually represented as 
> > > > an "expression statement", not a "type", and then the type printer 
> > > > tries to print it as an "expression statement". This then leads to 
> > > > printing code in APValue and TemplateBase thinking that it's actually 
> > > > printing a normal C++ expression (as in the case of a clang-tidy or 
> > > > similar refactor tool), and not as a non-type template parameter, 
> > > > yielding printout with omitted types, as that is generally expected 
> > > > when writing code. 
> > > > 
> > > > However, if you are using the printer to get a full type name in a 
> > > > string for different purposes, as I do, then you get incomplete 
> > > > information. So, there may be cases when you actually need/want 
> > > > full-type info and cases when you don't want it. Therefore, I think 
> > > > some new policy config should be in place to let people choose the 
> > > > behavior.
> > > > It would for now (at least until I actually start needing to get the 
> > > > types of inner braces), 
> > > 
> > > This ^ is what I was getting at - do you have a use/need for the explicit 
> > > nested type names? Given that those aren't ambiguous/the syntax is valid 
> > > C++ - what's the use case/need for them to be explicitly written?
> > > 
> > > Because that presents some tension between the needs of diagnostic users 
> > > (where I think adding this extra info that can be inferred from context 
> > > (by both the user and the C++ language/compiler) and would potentially 
> > > make diagnostics noisier/less helpful) and your use case (which I'm 
> > > trying to understand).
> > > 
> > > If there are multiple levels of deduction/ambiguity, I'm all for having 
> > > more explicit names as necessary to remove the ambiguity, so maybe small 
> > > examples of that would be helpful too, to make sure the solution is as 
> > > deep as it needs to be.
> > > 
> > > > however, in a real-world scenario I actually get output `NDArray<float, 
> > > > {{{0}}}, {{{0}}}, {{{0}}}>`
> > > 
> > > Right, I'm happy to/leaning towards/trying to understand if we can 
> > > consider that a bug in whatever way it shows up.
> > > 
> > > Hmm, seems this might be related to the FIXME here: 
> > > https://github.com/llvm/llvm-project/blob/e07ead85a368173a56e96a21d6841aa497ad80f8/clang/lib/AST/TemplateBase.cpp#L424
> > >  - which, right, is the code you're changing in this patch - but I think 
> > > the FIXME hints at this not being something we necessarily want to have 
> > > as opt-in, but as something that should be fixed in this code possibly 
> > > without the need for a new printing policy.
> > > 
> > > Seems reasonable that this produces at least unambiguous names (so it 
> > > should print out the name if it's in a deduced context) and I think it's 
> > > probably good if it prints out valid names (so even includes the name in 
> > > non-deduced contexts, because C++ requires that too when writing the 
> > > name) to simplify things for users of diagnostics, and in your case.
> > >  Given that those aren't ambiguous/the syntax is valid C++ - what's the 
> > > use case/need for them to be explicitly written?
> > 
> > I'm writing a clang-based tool for internal use in our company that will 
> > need to parse some c++ code from a machine-generated function, representing 
> > the execution of computation graph (like those used in machine learning 
> > inference), then extract the exact types of all intermediate results and 
> > feed that into an algorithm that will optimize and possibly reorder the 
> > execution of graph nodes, depending on the types of the intermediate 
> > results (after optimization the types may change, but we know exactly how). 
> > Since we are encoding a lot of compile-time information into the type names 
> > of the intermediates, we need full type names here. As I'm a rookie with 
> > LLVM and don't know better, I implemented the FrontendAction and 
> > RecursiveVisitor to visit variables representing the variables I'm 
> > interested in and used `getAsString` to obtain type names. Unfortunately, I 
> > was not getting full type names and therefore I investigated further and 
> > that resulted with this patch.
> > 
> > > Because that presents some tension between the needs of diagnostic users 
> > > (where I think adding this extra info that can be inferred from context 
> > > (by both the user and the C++ language/compiler) and would potentially 
> > > make diagnostics noisier/less helpful) and your use case (which I'm 
> > > trying to understand).
> > 
> > In that case, it makes sense to introduce a new policy. Maybe only a better 
> > name for it should be invented...
> > 
> > > Hmm, seems this might be related to the FIXME here: 
> > > https://github.com/llvm/llvm-project/blob/e07ead85a368173a56e96a21d6841aa497ad80f8/clang/lib/AST/TemplateBase.cpp#L424
> > >  - which, right, is the code you're changing in this patch - but I think 
> > > the FIXME hints at this not being something we necessarily want to have 
> > > as opt-in, but as something that should be fixed in this code possibly 
> > > without the need for a new printing policy.
> > > 
> > > Seems reasonable that this produces at least unambiguous names (so it 
> > > should print out the name if it's in a deduced context) and I think it's 
> > > probably good if it prints out valid names (so even includes the name in 
> > > non-deduced contexts, because C++ requires that too when writing the 
> > > name) to simplify things for users of diagnostics, and in your case.
> > 
> > If only a change in `TemplateBase.cpp` is applied, but not also in 
> > `APValue.cpp`, then at least I get consistently printed out 
> > `Height{{{0}}}`, regardless of the value of `PrintCanonicalTypes`, so 
> > change in this line may indeed be a bugfix.
> > 
> > However, a change in `APValue.cpp` indeed adds noise, as you call it, to 
> > the diagnostic messages, as it ensures that the full type name is written. 
> > However, the same noise level is always generated by GCC and, in my 
> > opinion, although noisier, the GCC's error message is more thorough. But 
> > I'm OK with keeping that change behind a new policy.
> > > Given that those aren't ambiguous/the syntax is valid C++ - what's the 
> > > use case/need for them to be explicitly written?
> > 
> > I'm writing a clang-based tool for internal use in our company that will 
> > need to parse some c++ code from a machine-generated function, representing 
> > the execution of computation graph (like those used in machine learning 
> > inference), then extract the exact types of all intermediate results and 
> > feed that into an algorithm that will optimize and possibly reorder the 
> > execution of graph nodes, depending on the types of the intermediate 
> > results (after optimization the types may change, but we know exactly how). 
> > Since we are encoding a lot of compile-time information into the type names 
> > of the intermediates, we need full type names here. As I'm a rookie with 
> > LLVM and don't know better, I implemented the FrontendAction and 
> > RecursiveVisitor to visit variables representing the variables I'm 
> > interested in and used getAsString to obtain type names. Unfortunately, I 
> > was not getting full type names and therefore I investigated further and 
> > that resulted with this patch.
> 
> But if the type as a string is correct as far as C++ is concerned, is that 
> not enough information (necessarily - the C++ compiler can determine all the 
> type information from the type as written with only the top level types in 
> the NTTP names)? If it's that the tool is trying to get the type information 
> purely from the string representation, maybe the tool should be doing things 
> differently - relying on Clang's AST/type APIs, rather than on inspecting the 
> resulting string?
> If it's that the tool is trying to get the type information purely from the 
> string representation, maybe the tool should be doing things differently - 
> relying on Clang's AST/type APIs, rather than on inspecting the resulting 
> string?

I completely agree, however, the documentation on type API is not very good, 
and couldn't find a better way to get a string representation of the type using 
this API, besides manually traversing the AST and concatenating bits and pieces 
into a string which is essentially the same thing that TypePrinter does...

On the other hand, inventing new data structures to hold the type information 
or directly saving pointers to parts of the AST does not sound appealing to me, 
at least not for the current level of complexity I'm aiming for.

> But if the type as a string is correct as far as C++ is concerned, is that 
> not enough information (necessarily - the C++ compiler can determine all the 
> type information from the type as written with only the top level types in 
> the NTTP names)?

I think that the essential problem here is the //**as far as C++ is 
concerned**//? What about the developer? Developers may have more benefit from 
getting a full type name in their diagnostic than just something that is 
correct as far as C++ is concerned.

Consider [this error message, which is completely 
useless](https://godbolt.org/z/c6W6oMYrh) and compare it to the much more 
informative errors by [GCC](https://godbolt.org/z/Ge89TePjq) and 
[MSVC](https://godbolt.org/z/aKr3T7nfz) that immediately tell you what's wrong 
with your code.

So, why shouldn't Clang behave the same way as well?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134453/new/

https://reviews.llvm.org/D134453

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to