We currently capture some source location information in the frontends, but there are many kinds of source entity for which we *don't* retain the location information after the initial parse.
For example, in the C/C++ frontends: * we don't capture the locations of the individual parameters within e.g. an extern function declaration, so we can't underline the pertinent param when there's a mismatching type in a call to that function decl. * we don't capture the locations of attributes of a function, so we can't underline these if they're wrong (e.g. a "noreturn" on a function that does in fact return). * we don't retain the locations of things like close parens and semicolons for after parsing, so we can't offer fix-it hints for adding new attributes, or, say the C++11 "override" feature. * we can't at present implement many kinds of useful "cousins" of a compiler on top of the GCC codebase (e.g. code refactoring tools, code reformatting tools, IDE support daemons, etc), since much of the useful location information is discarded at parse time. This patch kit implements: (a) a new, optional, representation of this location information, enabled by a command-line flag (b) improvements to various diagnostics to use this location information if it's present, falling back to the status-quo (less accurate) source locations otherwise (b) a gcc-based implementation of Microsoft's Language Server Protocol, https://github.com/Microsoft/language-server-protocol allowing IDEs to connect to a gcc-based LSP server, making RPC calls to query it for things like "where is this struct declared?". This last part is very much just a proof-of-concept. ================================ (a) The new location information ================================ Our existing "tree" type represents a node within an abstract syntax tree, but this is sometimes too abstract - sometimes we want the locations of the clauses and tokens that were abstracted away by the frontends. In theory we could generate the full parse tree ("concrete syntax tree"), showing every production followed to parse the input, but it is likely to be unwieldy: large and difficult to navigate. (aside: I found myself re-reading the Dragon book to refresh my mind on exactly what an AST vs a CST is; I also found this blog post to be very useful: http://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees ) So the patch kit implements a middle-ground: an additional tree of parse information, much more concrete than our "tree" type, but not quite the full parse tree. My working title for this system is "BLT" (and hence "class blt_node"). I could claim that this is a acronym for "bonus location tree" (but it's actually a reference to a sandwich) - it needs a name, and that name needs to not clash with anything else in the source tree. "Parse Tree" would be "PT" which clashes with "points-to", and "Concrete Syntax Tree" would be "CST" which clashes with our abbreviation for "constant". ("BLT" popped into my mind somewhere between "AST" and "CST"; ideas for better names welcome). blt_nodes form a tree-like structure; a blt_node has a "kind", identifying the terminal or nonterminal it corresponds to (e.g. BLT_TRANSLATION_UNIT or BLT_DECLARATION_SPECIFIERS). This is just an enum, but allows for language-specific traversals, without introducing significant language-specific features in the shared "gcc" dir (it's just an enum of IDs). There is a partial mapping between "tree" and blt_node: a blt_node can reference a tree, and a tree can reference a blt_node, though typically the mapping is very sparse; most don't. This allows us to go from e.g. a function_decl in the "tree" world and navigate to pertinent parts of the syntax that was used to declare it. All of this is enabled by a new "-fblt" command-line option; in the absense of -fblt, almost all of it becomes close to no-ops, and the relevant diagnostics fall back to using less accurate location information. So it's a kind of optional, "on-the-side" record of how we parsed the source, with a sparse relationship to our tree type. The patch kit implements it for the C and C++ frontends. An example of a BLT dump for a C file can be seen here: https://dmalcolm.fedorapeople.org/gcc/2017-07-24/fdump-blt.html It shows the tree structure using indentation (and colorization); source locations are printed, and, for each node where the location is different from the parent, the pertinent source range is printed and underlined inline. (BTW, does the colorization of dumps look useful for other dump formats? similarly for the ASCII art for printing hierarchies) ===================== (b) Examples of usage ===================== Patches 6-10 in the kit update various diagnostics to use the improved location information where available: * C and C++: highlighting the pertinent parameter of a function decl when there's a mismatched type in a call * C and C++: highlighting the return type in the function defn when compaining about mismatch in the body (e.g. highlighting the "void" when we see a "return EXPR;" within a void function). * C++: add a fix-it hint to -Wsuggest-override I have plenty of ideas for other uses of this infrastructure (but which aren't implemented yet), e.g.: * C++: highlight the "const" token (or suggest a fix-it hint) when you have a missing "const" on the *definition* of a member function that was declared as "const" (I make this mistake all the time). * C++: add a fix-it hint to -Wsuggest-final-methods * highlight bogus attributes * add fix-it hints suggesting missing attributes ...etc, plus those "cousins of a compiler" ideas mentioned above. Any other ideas? ============================ (c) Language Server Protocol ============================ The later parts of the patch kit implement a proof-of-concept LSP server, making use of the extended location information, exposing it to IDEs. LSP is an RPC protocol layered on top of JSON-RPC (and hence JSON and HTTP): https://github.com/Microsoft/language-server-protocol so the patch kit implements a set of classes to support this (including a barebones HTTP server running inside cc1), and a toy IDE written in PyGTK to test it. ======= Caveats ======= * There are plenty of FIXMEs and TODOs in the patch kit. * I've entirely ignored tentative parsing in the C++ frontend for now. * I haven't attempted to optimize it at all yet (so no performance measurements yet). * How much of the syntax tree ought to be captured? I've focussed on the stuff outside of function bodies, since we don't currently capture that well, but to do "proper" IDE support we'd want to capture things more deeply. (I experimented with using it to fix some of our missing location information for things like uses of constants and variables as arguments at callsites, but it quickly turned into a much more invasive patch). * The LSP implementation is a just a proof-of-concept, to further motivate capturing the extra data. Turning it into a "proper" LSP server implementation would be a *lot* more work, and I'm unlikely to actually do that (but maybe someone on the list wants to take this on?) I've successfully bootstrapped®rtested the combination of the patches on x86_64-pc-linux-gnu; takes -fself-test from 39458 passes to 41574; adds 30 PASS results to gcc.sum; adds 182 PASS results to g++.sum. Thoughts? Dave David Malcolm (17): Add param-type-mismatch.c/C testcases as a baseline diagnostics: support prefixes within diagnostic_show_locus Core of BLT implementation C frontend: capture BLT information C++ frontend: capture BLT information C: use BLT to highlight parameter of callee decl for mismatching types C++: use BLT to highlight parameter of callee decl for mismatching types C: highlight return types when complaining about mismatches C++: highlight return types when complaining about mismatches C++: provide fix-it hints in -Wsuggest-override Add JSON implementation Add server.h and server.c Add http-server.h and http-server.c Add implementation of JSON-RPC Language Server Protocol: add lsp::server abstract base class Language Server Protocol: proof-of-concept GCC implementation Language Server Protocol: work-in-progess on testsuite gcc/Makefile.in | 7 + gcc/blt.c | 768 ++++++++ gcc/blt.def | 87 + gcc/blt.h | 147 ++ gcc/c-family/c-opts.c | 2 +- gcc/c-family/c.opt | 8 + gcc/c/c-decl.c | 13 +- gcc/c/c-parser.c | 241 ++- gcc/c/c-tree.h | 6 +- gcc/c/c-typeck.c | 120 +- gcc/common.opt | 4 + gcc/cp/call.c | 79 +- gcc/cp/class.c | 23 +- gcc/cp/cp-tree.h | 7 + gcc/cp/decl.c | 32 +- gcc/cp/parser.c | 369 +++- gcc/cp/parser.h | 7 + gcc/cp/pt.c | 8 + gcc/cp/typeck.c | 70 +- gcc/diagnostic-show-locus.c | 94 +- gcc/diagnostic.c | 5 +- gcc/http-server.c | 358 ++++ gcc/http-server.h | 101 ++ gcc/json-rpc.c | 486 +++++ gcc/json-rpc.h | 94 + gcc/json.c | 1914 ++++++++++++++++++++ gcc/json.h | 214 +++ gcc/lsp-main.c | 168 ++ gcc/lsp-main.h | 25 + gcc/lsp.c | 291 +++ gcc/lsp.h | 210 +++ gcc/selftest-run-tests.c | 5 + gcc/selftest.h | 5 + gcc/server.c | 152 ++ gcc/server.h | 46 + gcc/testsuite/g++.dg/bad-return-type.C | 135 ++ .../g++.dg/diagnostic/param-type-mismatch.C | 159 ++ gcc/testsuite/g++.dg/warn/Wsuggest-override.C | 12 +- gcc/testsuite/gcc.dg/bad-return-type.c | 67 + gcc/testsuite/gcc.dg/lsp/lsp.py | 125 ++ gcc/testsuite/gcc.dg/lsp/test.c | 12 + gcc/testsuite/gcc.dg/lsp/test.py | 28 + gcc/testsuite/gcc.dg/lsp/toy-ide.py | 111 ++ gcc/testsuite/gcc.dg/param-type-mismatch.c | 60 + .../plugin/diagnostic_plugin_test_show_locus.c | 1 + gcc/toplev.c | 4 + 46 files changed, 6772 insertions(+), 108 deletions(-) create mode 100644 gcc/blt.c create mode 100644 gcc/blt.def create mode 100644 gcc/blt.h create mode 100644 gcc/http-server.c create mode 100644 gcc/http-server.h create mode 100644 gcc/json-rpc.c create mode 100644 gcc/json-rpc.h create mode 100644 gcc/json.c create mode 100644 gcc/json.h create mode 100644 gcc/lsp-main.c create mode 100644 gcc/lsp-main.h create mode 100644 gcc/lsp.c create mode 100644 gcc/lsp.h create mode 100644 gcc/server.c create mode 100644 gcc/server.h create mode 100644 gcc/testsuite/g++.dg/bad-return-type.C create mode 100644 gcc/testsuite/g++.dg/diagnostic/param-type-mismatch.C create mode 100644 gcc/testsuite/gcc.dg/bad-return-type.c create mode 100644 gcc/testsuite/gcc.dg/lsp/lsp.py create mode 100644 gcc/testsuite/gcc.dg/lsp/test.c create mode 100644 gcc/testsuite/gcc.dg/lsp/test.py create mode 100644 gcc/testsuite/gcc.dg/lsp/toy-ide.py create mode 100644 gcc/testsuite/gcc.dg/param-type-mismatch.c -- 1.8.5.3