[PATCH] D66751: [clangd] Add targetDecl(), which determines what declaration an AST node refers to.

Sam McCall via Phabricator via cfe-commits Tue, 27 Aug 2019 06:37:46 -0700

sammccall added a comment.

First a wall of text to reply to the general comments. Let's chat if this 
doesn't make sense.

In D66751#1646545 <https://reviews.llvm.org/D66751#1646545>, @ilya-biryukov 
wrote:

> Mostly LGTM, thanks! Just a few clarifying questions and suggestions
>
> This API allows us to get a target declaration for a reference, but does not 
> help with finding the source locations for those references. Do you plan to 
> put this into the API? Have this as a separate function?

Yes. I actually ripped this out of the patch, because it wasn't related enough 
and the patch is large.

To restate the relationship between these functions:
Many operations ultimately want to join source location and decl, e.g. gtd is 
location -> node -> decl, xrefs is decl -> node -> location, indexing is node 
-> (location, decl).
So providing node->decl and node->location primitives is sufficient. If you 
want to go in the other direction (decl->node or location->node) then of course 
you need to traverse and/or build an index.

With that said, there are several legitimate node->location models, and I think 
this is a good reason to to keep that decoupled from node->decl (testability is 
another). In particular:

the "flat" model
================

says that each node may have a token which is its "handle", and each token can 
be the handle for (refer to) only one node.

  int a = X::y;
  iii a   X  y

What is the range for a node? Its handle token.
What is the node for a range? The unique handle token covered by the range.

Strengths: accurate, precise triggering on node names. simplicity (conforms to 
user's view of the code)
Weaknesses: multi-token ranges (selections), implicit nodes (they don't get to 
have locations), requires specific per-node implementation
Use case: go-to-definition/xrefs triggering, xrefs results (nodes referenced by 
name)

the "hierarchical" model
========================

says that each node owns a range of tokens, and these nest.

  int a = X::y;
          X
          XXX
  iii     yyyy
  aaaaaaaaaaaa

What is the range for a node? Its full range.
What is the node for a range? The innermost node that entirely contains the 
selection.

Strengths: exposing underlying structure/hierarchy of the code, can be (mostly) 
implemented generically
Weaknesses: loose triggering that doesn't distinguish e.g. node names from 
qualifiers
Use case: extract variable, structured selection, xrefs results (nodes 
referenced implicitly)

---

I think we're going to need/want both. The hierarchical model corresponds to 
SelectionTree. The flat model is the thing we need to build. libIndex looks 
more like the flat model than the hierarchical one, but has elements of both I 
think.
(If you think the above explanation is useful, we can check something like it 
in somewhere)

> E.g. finding a source location of field for DeclRefExpr produced for 
> MyBase::field seems to be the same amount of work (in general case) as 
> finding the Decl* of field.

For the hierarchical model it's easy, this is one of the few methods 
DynTypeNode actually has :-)
For the flat model: it's not trivial, but is less work because while you do 
have to dispatch over all node types, you don't have general multi-hop/graph 
cases (I think).
Usually the "handle" tokens are well-supported by the AST because they're 
precisely the ones that diagnostics want to report. So in the DeclRefExpr case, 
DeclRefExpr::getLoc() returns the right thing.

In D66751#1646675 <https://reviews.llvm.org/D66751#1646675>, @kadircet wrote:

> I agree with Ilya's concerns on SourceLocations, even though most of the time 
> user will have the source locations available in the dyntyped node, they 
> might need some traversals to fetch the exact range especially in cases of 
> nested name specifiers.

I think if you want the (full) range, this is easy as mentioned above: 
getSourceRange() (available on DynTypedNode, but also on each specific node 
type).
If you want just the "handle" token for the flat model, NNS isn't a 
particularly hard case: it's basically a linked list. Each node claims one 
component of the NNS as its "handle". If you want to treat NNS as part of the 
parent (so pointing at X in X::y counts as a ref to Y) then I think you're 
better off with the hierarchical model.

> It would be nice to provide a referencing range in addition to the decls.

Concretely, can you give an example of which node and what you'd want it to 
point to? If it's just the single unqualified-name token, I agree and would 
like to add it (as a separate patch). If it's the full range, I think that's 
getSourceRange(). Is it something else?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66751/new/

https://reviews.llvm.org/D66751

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66751: [clangd] Add targetDecl(), which determines what declaration an AST node refers to.

Reply via email to