On Fri, 2017-05-12 at 12:47 -0600, Jeff Law wrote: > On 05/02/2017 01:08 PM, David Malcolm wrote: > > Currently the C/C++ frontends discard comments when parsing. > > It's possible to set up libcpp to capture comments as tokens, > > by setting CPP_OPTION (pfile, discard_comments) to false), > > and this can be enabled using the -C command line option (see > > also -CC), but c-family/c-lex.c then discards any CPP_COMMENT > > tokens it sees, so they're not seen by the frontend parser. > > > > The following patch adds an (optional) callback to libcpp > > for handling comments, giving the comment content, and the > > location it was seen at. This approach allows arbitrary > > logic to be wired up to comments, and avoids having to > > copy the comment content to a new buffer (which the CPP_COMMENT > > approach does). > > > > This could be used by plugins to chain up on the callback > > e.g. to parse specially-formatted comments, e.g. for > > documentation generation, or e.g. for GObject introspection > > annotations [1]. > > > > As a proof of concept, the patch uses this to add a spellchecker > > for comments. It uses the Enchant meta-library: > > https://abiword.github.io/enchant/ > > (essentially a wrapper around 8 different spellchecking libraries). > > I didn't bother with the autotool detection for enchant, and > > just hacked it in for now. > > > > Example output: > > > > test.c:3:46: warning: spellcheck_word: "evaulate" > > When NONCONST_PRED is false the code will evaulate to constant > > and > > ^~~~~~~~ > > test.c:3:46: note: suggestion: "evaluate" > > When NONCONST_PRED is false the code will evaulate to constant > > and > > ^~~~~~~~ > > evaluate > > test.c:3:46: note: suggestion: "ululate" > > When NONCONST_PRED is false the code will evaulate to constant > > and > > ^~~~~~~~ > > ululate > > test.c:3:46: note: suggestion: "elevate" > > When NONCONST_PRED is false the code will evaulate to constant > > and > > ^~~~~~~~ > > elevate > > > > License-wise, Enchant is LGPL 2.1 "or (at your option) any > > later version." with a special exception to allow non-LGPL > > spellchecking providers (e.g. to allow linking against an > > OS-provided spellchecker). > > > > Various FIXMEs are present (e.g. hardcoded "en_US" for the > > language to spellcheck against). > > Also, the spellchecker has a lot of false positives e.g. > > it doesn't grok URLs (and thus complains when it seens them); > > similar for DejaGnu directives etc. > > > > Does enchant seem like a reasonable dependency for the compiler? > > (it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule > > -2.0.so.0). > > Or would this be better pursued as a plugin? (if so, I'd > > prefer the plugin to live in the source tree as an example, > > rather than out-of-tree). > > > > Unrelated to spellchecking, I also added two new options: > > -Wfixme and -Wtodo, for warning when comments containing > > "FIXME" or "TODO" are encountered. > > I use such comments a lot during development. I thought some > > people might want a warning about them (I tend to just use grep > > though). [TODO: document these in invoke.texi, add test cases] > > > > Thoughts? Does any of this sound useful? > > > > [not yet bootstrapped; as noted above, I haven't yet done > > the autoconf stuff for handling Enchant] > > > > [1] > > https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations > > > > gcc/ChangeLog: > > * Makefile.in (LIBS): Hack in -lenchant for now. > > (OBJS): Add spellcheck-enchant.o. > > * common.opt (Wfixme): New option. > > (Wtodo): New option. > > * spellcheck-enchant.c: New file. > > * spellcheck-enchant.h: New file. > > > > gcc/c-family/ChangeLog: > > * c-lex.c: Include spellcheck-enchant.h. > > (init_c_lex): Wire up spellcheck_enchant_check_comment to the > > comment callback. > > * c-opts.c: Include spellcheck-enchant.h. > > (c_common_post_options): Call spellcheck_enchant_init. > > (c_common_finish): Call spellcheck_enchant_finish. > > > > libcpp/ChangeLog: > > * include/cpplib.h (struct cpp_callbacks): Add "comment" > > callback. > > * lex.c (_cpp_lex_direct): Call the comment callback if non > > -NULL. > enchant seems a bit out of the sweet spot, particular just to catch > mis-spellings in comments. But it might make an interesting plugin. > > IIRC from our meeting earlier this week, you had another use case > that > might have been more compelling, but I can't remember what it was.
(FWIW I think we were chatting about the GObjectIntrospection annotations use-case) > I do like the ability to at least capture the comments better and > while > we don't have a strong need for that capability now, we might in the > future. > > Jeff I stripped out all of the spellchecking stuff, and restricted the scope of the patch to just adding a callback for plugins to hook into if they're interested in handling comments. I added an example of such a plugin in the testsuite. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. Committed to trunk as r248901. gcc/testsuite/ChangeLog: * g++.dg/plugin/comment_plugin.c: New test plugin. * g++.dg/plugin/comments-1.C: New test file. * g++.dg/plugin/plugin.exp (plugin_test_list): Add the above. libcpp/ChangeLog: * include/cpplib.h (struct cpp_callbacks): Add "comment" callback. * lex.c (_cpp_lex_direct): Call the comment callback if non-NULL. --- gcc/testsuite/g++.dg/plugin/comment_plugin.c | 63 ++++++++++++++++++++++++++++ gcc/testsuite/g++.dg/plugin/comments-1.C | 49 ++++++++++++++++++++++ gcc/testsuite/g++.dg/plugin/plugin.exp | 1 + libcpp/include/cpplib.h | 9 ++++ libcpp/lex.c | 7 ++++ 5 files changed, 129 insertions(+) create mode 100644 gcc/testsuite/g++.dg/plugin/comment_plugin.c create mode 100644 gcc/testsuite/g++.dg/plugin/comments-1.C diff --git a/gcc/testsuite/g++.dg/plugin/comment_plugin.c b/gcc/testsuite/g++.dg/plugin/comment_plugin.c new file mode 100644 index 0000000..c3b08e3 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/comment_plugin.c @@ -0,0 +1,63 @@ +/* Test of cpp_callbacks::comments. */ + +#include "gcc-plugin.h" +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "cpplib.h" +#include "diagnostic.h" +#include "c-family/c-pragma.h" + +int plugin_is_GPL_compatible; + +/* Test callback for cpp_callbacks::comments. */ + +void +my_comment_cb (cpp_reader *, source_location loc, + const unsigned char *content, size_t len) +{ + if (in_system_header_at (loc)) + return; + + /* CONTENT contains the opening slash-star (or slash-slash), + and for C-style comments contains the closing star-slash. */ + gcc_assert (len >= 2); + gcc_assert (content[0] == '/'); + gcc_assert (content[1] == '*' || content[1] == '/'); + bool c_style = (content[1] == '*'); + if (c_style) + { + gcc_assert (content[len - 2] == '*'); + gcc_assert (content[len - 1] == '/'); + } + + if (c_style) + inform (loc, "got C-style comment; length=%i", len); + else + inform (loc, "got C++-style comment; length=%i", len); + + /* Print the content of the comment. + For a C-style comment, the buffer CONTENT contains the opening + slash-star and closing star-slash, so we can't directly verify + it in the DejaGnu test without adding another comment, which + would trigger this callback again. + Hence we skip the syntactically-significant parts of the comment + when printing it. */ + fprintf (stderr, "stripped content of comment: >"); + /* Avoid printing trailing star-slash. */ + if (c_style) + len -= 2; + for (size_t i = 2; i < len; i++) + fputc (content[i], stderr); + fprintf (stderr, "<\n"); +} + +int +plugin_init (struct plugin_name_args *plugin_info, + struct plugin_gcc_version *version) +{ + cpp_callbacks *cb = cpp_get_callbacks (parse_in); + cb->comment = my_comment_cb; + + return 0; +} diff --git a/gcc/testsuite/g++.dg/plugin/comments-1.C b/gcc/testsuite/g++.dg/plugin/comments-1.C new file mode 100644 index 0000000..0821b14 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/comments-1.C @@ -0,0 +1,49 @@ +/* Example of a one-line C-style comment. */ +#if 0 +{ dg-message "1: got C-style comment; length=45" "" { target *-*-* } .-2 } +{ dg-begin-multiline-output "" } +stripped content of comment: > Example of a one-line C-style comment. < +{ dg-end-multiline-output "" } +#endif + + /*Another example of a one-line C-style comment.*/ +#if 0 +{ dg-message "6: got C-style comment; length=50" "" { target *-*-* } .-2 } +{ dg-begin-multiline-output "" } +stripped content of comment: >Another example of a one-line C-style comment.< +{ dg-end-multiline-output "" } +#endif + +/**/ +#if 0 +{ dg-message "1: got C-style comment; length=4" "" { target *-*-* } .-2 } +{ dg-begin-multiline-output "" } +stripped content of comment: >< +{ dg-end-multiline-output "" } +#endif + +/* Example of a + multi-line C-style comment. */ +#if 0 +{ dg-message "1: got C-style comment; length=50" "" { target *-*-* } .-3 } +{ dg-begin-multiline-output "" } +stripped content of comment: > Example of a + multi-line C-style comment. < +{ dg-end-multiline-output "" } +#endif + +// Example of a C++-style comment +#if 0 +{ dg-message "1: got C\\+\\+-style comment; length=33" "" { target *-*-* } .-2 } +{ dg-begin-multiline-output "" } +stripped content of comment: > Example of a C++-style comment< +{ dg-end-multiline-output "" } +#endif + +// +#if 0 +{ dg-message "1: got C\\+\\+-style comment; length=2" "" { target *-*-* } .-2 } +{ dg-begin-multiline-output "" } +stripped content of comment: >< +{ dg-end-multiline-output "" } +#endif diff --git a/gcc/testsuite/g++.dg/plugin/plugin.exp b/gcc/testsuite/g++.dg/plugin/plugin.exp index 94ebe93..e40cba3 100644 --- a/gcc/testsuite/g++.dg/plugin/plugin.exp +++ b/gcc/testsuite/g++.dg/plugin/plugin.exp @@ -68,6 +68,7 @@ set plugin_test_list [list \ { show_template_tree_color_plugin.c \ show-template-tree-color.C \ show-template-tree-color-no-elide-type.C } \ + { comment_plugin.c comments-1.C } \ ] foreach plugin_test $plugin_test_list { diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index b843992..66ef4d6 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -609,6 +609,15 @@ struct cpp_callbacks /* Callback for providing suggestions for misspelled directives. */ const char *(*get_suggestion) (cpp_reader *, const char *, const char *const *); + + /* Callback for when a comment is encountered, giving the location + of the opening slash, a pointer to the content (which is not + necessarily 0-terminated), and the length of the content. + The content contains the opening slash-star (or slash-slash), + and for C-style comments contains the closing star-slash. For + C++-style comments it does not include the terminating newline. */ + void (*comment) (cpp_reader *, source_location, const unsigned char *, + size_t); }; #ifdef VMS diff --git a/libcpp/lex.c b/libcpp/lex.c index 9edd2a6..40ff801 100644 --- a/libcpp/lex.c +++ b/libcpp/lex.c @@ -2889,6 +2889,13 @@ _cpp_lex_direct (cpp_reader *pfile) if (fallthrough_comment_p (pfile, comment_start)) fallthrough_comment = true; + if (pfile->cb.comment) + { + size_t len = pfile->buffer->cur - comment_start; + pfile->cb.comment (pfile, result->src_loc, comment_start - 1, + len + 1); + } + if (!pfile->state.save_comments) { result->flags |= PREV_WHITE; -- 1.8.5.3