On Tue, 2017-05-02 at 14:11 -0700, Mike Stump wrote: > On May 2, 2017, at 12:08 PM, David Malcolm <dmalc...@redhat.com> > wrote: > > > > As a proof of concept, the patch uses this to add a spellchecker > > for comments. > > :-) > > > I didn't bother with the autotool detection for enchant, and > > just hacked it in for now. > > Only other comment would be, rather then requiring it, would be nice > to make it optional. I can host gcc in an environment that is a bare > metal target on newlib. autoconf can smell it, and tell that a ton > of things are missing.
Given that Enchant seems a stretch as a hard dependency, it seems better to either: (a) have this code live as a plugin, or (b) attempt to dynamically load libenchant on-demand (a "soft" dependency ?) This patch implements (a): the plugin approach. We don't currently install any "official" plugins; ideally, if a plugin, I'd like it to be one of the things that "make install" installs: it add dependencies that are too much to impose on everyone's cc1 et al (hence a plugin), but which a significant number of people might want to opt in to (hence in-tree and "supported", for some meaning of that term). We don't have any mechanism or policies to allow that yet, so in the meantime, here's a revised version of the spellchecker code which adds it to the *testsuite*, as an example plugin. (specifically the C++ testsuite, to support both C and C++ style comments without needing to silence the compat flag). Changes: * moved it all to a plugin, within the testsuite * added test coverage * added option -fplugin-arg-spellcheck-show-provider * added option -fplugin-arg-spellcheck-language * fixed bug with misspelling at very end of a comment * fixed bug with apostrophes breaking words * fixed quoting of suggestions * don't spellcheck within system headers * don't spellcheck DejaGnu directives * don't spellcheck URLs * removed the -Wfixme and -Wtodo idea libenchant uses pkg-config to expose the flags needed to build against it, so the patch adds a new "dg-pkgconfig" directive to plugin-support.exp. This attempts to use pkg-config to locate the appropriate flags for building against libenchant. If it fails, the test is marked as unsupported. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/testsuite/ChangeLog: * g++.dg/plugin/plugin.exp (plugin_test_list): Add the following... * g++.dg/plugin/spellcheck-1.C: New test case. * g++.dg/plugin/spellcheck-2.C: New test case. * g++.dg/plugin/spellcheck-3.C: New test case. * g++.dg/plugin/spellcheck-4.C: New test case. * g++.dg/plugin/spellcheck-5.C: New test case. * g++.dg/plugin/spellcheck-6.C: New test case. * g++.dg/plugin/spellcheck.c: New test plugin. * lib/plugin-support.exp (plugin-get-options): Add handling for "dg-pkgconfig" to the special-cased directive handling. Verbosely log the resulting value of dg-extra-tool-flags. (plugin-test-execute): Handle "unsupported" errors within plugin-get-options by marking the testcase as unsupported and immediately returning. --- gcc/testsuite/g++.dg/plugin/plugin.exp | 7 + gcc/testsuite/g++.dg/plugin/spellcheck-1.C | 14 ++ gcc/testsuite/g++.dg/plugin/spellcheck-2.C | 13 ++ gcc/testsuite/g++.dg/plugin/spellcheck-3.C | 2 + gcc/testsuite/g++.dg/plugin/spellcheck-4.C | 19 ++ gcc/testsuite/g++.dg/plugin/spellcheck-5.C | 4 + gcc/testsuite/g++.dg/plugin/spellcheck-6.C | 5 + gcc/testsuite/g++.dg/plugin/spellcheck.c | 361 +++++++++++++++++++++++++++++ gcc/testsuite/lib/plugin-support.exp | 32 ++- 9 files changed, 456 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-1.C create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-2.C create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-3.C create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-4.C create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-5.C create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-6.C create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck.c diff --git a/gcc/testsuite/g++.dg/plugin/plugin.exp b/gcc/testsuite/g++.dg/plugin/plugin.exp index e40cba3..e3f030f 100644 --- a/gcc/testsuite/g++.dg/plugin/plugin.exp +++ b/gcc/testsuite/g++.dg/plugin/plugin.exp @@ -69,6 +69,13 @@ set plugin_test_list [list \ show-template-tree-color.C \ show-template-tree-color-no-elide-type.C } \ { comment_plugin.c comments-1.C } \ + { spellcheck.c \ + spellcheck-1.C \ + spellcheck-2.C \ + spellcheck-3.C \ + spellcheck-4.C \ + spellcheck-5.C \ + spellcheck-6.C } \ ] foreach plugin_test $plugin_test_list { diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-1.C b/gcc/testsuite/g++.dg/plugin/spellcheck-1.C new file mode 100644 index 0000000..242ab13 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/spellcheck-1.C @@ -0,0 +1,14 @@ +/* When NONCONST_PRED is false the code will evaulate to constant. */ +/* { dg-warning "46: possible spelling mistake in comment: 'evaulate'" "" { target *-*-* } .-1 } */ + +// Example of a boogus C++-style comment +/* { dg-warning "17: possible spelling mistake in comment: 'boogus'" "" { target *-*-* } .-1 } */ + +/* Exercise the code path for the final word in a commment*/ +/* { dg-warning "51: possible spelling mistake in comment: 'commment'" "" { target *-*-* } .-1 } */ + +// Exercise the code path for the final word in a C++ commment +/* { dg-warning "55: possible spelling mistake in comment: 'commment'" "" { target *-*-* } .-1 } */ + +/* Examples of words that are correctly spelled, containing quotes: + don't doesn't is isn't. */ diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-2.C b/gcc/testsuite/g++.dg/plugin/spellcheck-2.C new file mode 100644 index 0000000..8f28f62 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/spellcheck-2.C @@ -0,0 +1,13 @@ +/* { dg-options "-fdiagnostics-show-caret" } */ + +/* Verify that misspelled words are underlined correctly. We need + a word that the checker won't offer a suggestion for, to avoid + having to specify checker output within the test case. + Hence the use of splendiferouslitudinalistic here. */ +#if 0 +{ dg-warning "possible spelling mistake in comment: 'splendiferouslitudinalistic'" "" { target *-*-* } .-2 } +{ dg-begin-multiline-output "" } + Hence the use of splendiferouslitudinalistic here. + ^~~~~~~~~~~~~~~~~~~~~~~~~~~ +{ dg-end-multiline-output "" } +#endif diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-3.C b/gcc/testsuite/g++.dg/plugin/spellcheck-3.C new file mode 100644 index 0000000..c497988 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/spellcheck-3.C @@ -0,0 +1,2 @@ +// { dg-options "-fplugin-arg-spellcheck-show-provider" } +// { dg-message "spellcheck provider: '.*' description: '.*' file: '.*'" "" { target *-*-* } 0 } diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-4.C b/gcc/testsuite/g++.dg/plugin/spellcheck-4.C new file mode 100644 index 0000000..2038f10 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/spellcheck-4.C @@ -0,0 +1,19 @@ +/* Verify that we don't complain about our legal boilerplate. */ + +/* Copyright (C) 2017 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-5.C b/gcc/testsuite/g++.dg/plugin/spellcheck-5.C new file mode 100644 index 0000000..c7f4714 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/spellcheck-5.C @@ -0,0 +1,4 @@ +// { dg-options "-fplugin-arg-spellcheck-language=this_is_not_valid" } +// { dg-error "unknown language tag: 'this_is_not_valid'" "" { target *-*-* } 0 } + +/* A test comment. */ diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-6.C b/gcc/testsuite/g++.dg/plugin/spellcheck-6.C new file mode 100644 index 0000000..5a95881 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/spellcheck-6.C @@ -0,0 +1,5 @@ +// Example of specifying language +// { dg-options "-fplugin-arg-spellcheck-language=en_GB" } +// Is it color or colour? +// { dg-warning "possible spelling mistake in comment: 'color'" "" { target *-*-* } .-1 } +// { dg-message "suggestion: 'colour'" "" { target *-*-* } .-2 } diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck.c b/gcc/testsuite/g++.dg/plugin/spellcheck.c new file mode 100644 index 0000000..6851637 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/spellcheck.c @@ -0,0 +1,361 @@ +/* Spellchecking using the Enchant "meta-library". + Copyright (C) 2017 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +/* This plugin links against libenchant. + Use pkg-config to locate compilation and linkage flags for + libenchant, treating the testcase as unsupported if pkg-config or + enchant are not available. */ + +/* { dg-pkgconfig "enchant" } */ + +#include "gcc-plugin.h" +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "diagnostic.h" +#include "options.h" +#include "c-family/c-pragma.h" + +/* The dg-pkgconfig above should have provided the necessary + -I flags to locate enchant.h (e.g. "-I/usr/include/enchant"). */ + +#include <enchant.h> + +/* Enchant is LGPL 2.1 "or (at your option) any later version." with + a special exception to allow non-LGPL spellchecking providers. */ + +int plugin_is_GPL_compatible; + +/* Class wrapping the result of enchant_dict_suggest, ensuring that + the suggestions are cleaned up. */ + +class suggestions +{ + public: + suggestions (EnchantDict *dict, const char *const word, ssize_t len) + : m_dict (dict), m_suggestions (NULL) + { + m_suggestions = enchant_dict_suggest (m_dict, word, len, &m_num); + } + ~suggestions () + { + enchant_dict_free_string_list (m_dict, m_suggestions); + } + + size_t get_count () const { return m_num; } + const char *get_suggestion (size_t index) const + { + return m_suggestions[index]; + } + + private: + EnchantDict *m_dict; + size_t m_num; + char **m_suggestions; +}; + +/* Callback for describing metadata about the Enchant provider in use. */ + +static void +describe_broker_cb (const char * const provider_name, + const char * const provider_desc, + const char * const provider_dll_file, + void *) +{ + inform (UNKNOWN_LOCATION, + "spellcheck provider: %qs description: %qs file: %qs", + provider_name, provider_desc, provider_dll_file); +} + +/* A class for wrapping access to the Enchant API (both the broker + and the dict). */ + +class checker +{ + public: + checker (const char *language_tag, bool show_provider); + ~checker (); + + bool misspelled_p (const char *const word, ssize_t len); + suggestions get_suggestions (const char *const word, ssize_t len); + + static checker *singleton; + + private: + EnchantBroker *m_broker; + EnchantDict *m_dict; +}; + +/* Callback for describing the available languages. */ + +static void +describe_dict_cb (const char * const lang_tag, + const char * const provider_name, + const char * const provider_desc, + const char * const provider_file, + void *) +{ + inform (UNKNOWN_LOCATION, + "available language: %qs" + " (provider: %qs description: %qs file: %qs)", + lang_tag, provider_name, provider_desc, provider_file); +} + +/* checker's ctor. */ + +checker::checker (const char *language_tag, bool show_provider) +: m_broker (NULL), m_dict (NULL) +{ + m_broker = enchant_broker_init (); + if (!m_broker) + { + error_at (UNKNOWN_LOCATION, "enchant_broker_init failed"); + return; + } + m_dict = enchant_broker_request_dict (m_broker, language_tag); + if (!m_dict) + { + error_at (UNKNOWN_LOCATION, "unknown language tag: %qs", language_tag); + enchant_broker_list_dicts (m_broker, describe_dict_cb, NULL); + } + + if (show_provider) + enchant_broker_describe (m_broker, describe_broker_cb, NULL); + + // TODO: will probably want to support local wordlists also? +} + +/* checker's dtor. */ + +checker::~checker () +{ + if (m_dict) + enchant_broker_free_dict (m_broker, m_dict); + if (m_broker) + enchant_broker_free (m_broker); +} + +/* Return true if WORD of length LEN appears to be misspelled. */ + +bool +checker::misspelled_p (const char *const word, ssize_t len) +{ + if (!m_dict) + return false; + + int result = enchant_dict_check (m_dict, word, len); + return result != 0; +} + +/* Get suggestions for misspelled WORD of length LEN. */ + +suggestions +checker::get_suggestions (const char *const word, ssize_t len) +{ + gcc_assert (m_dict); + return suggestions (m_dict, word, len); +} + +/* Singleton instance of checker. */ + +checker *checker::singleton = NULL; + +/* Generate a location for a word of length LEN within ORD_MAP at the + given LINE AND COLUMN. */ + +static location_t +make_word_location (const line_map_ordinary *ord_map, int line, int column, + size_t len) +{ + location_t start_loc + = linemap_position_for_line_and_column (line_table, ord_map, line, column); + location_t end_loc + = linemap_position_for_line_and_column (line_table, ord_map, line, + column + len - 1); + return make_location (start_loc, start_loc, end_loc); +} + +/* Spellcheck the word at WORD of length LEN, with a location + within ORD_MAP at the given LINE AND COLUMN. + Issue warnings and suggestions for possibly misspelled words. */ + +static void +spellcheck_word (const unsigned char *word, size_t len, + const line_map_ordinary *ord_map, int line, int column) +{ + /* Don't bother spellchecking words with underscores or any upper case, + as these may refer to code entities. + TODO: should we also merge in names from code? */ + for (size_t i = 0; i < len; i++) + { + if (word[i] == '_' || ISUPPER (word[i])) + return; + } + + if (!checker::singleton->misspelled_p ((const char *)word, len)) + return; + + location_t word_loc = make_word_location (ord_map, line, column, len); + + if (warning_at (word_loc, 0, "possible spelling mistake in comment: %q.*s", (int)len, word)) + { + suggestions s + = checker::singleton->get_suggestions ((const char *)word, len); + for (size_t i = 0; i < s.get_count (); i++) + { + rich_location richloc (line_table, word_loc); + richloc.add_fixit_replace (s.get_suggestion (i)); + inform_at_rich_loc (&richloc, "suggestion: %qs", + s.get_suggestion (i)); + } + } +} + +/* Callback for cpp_callbacks::comments for spellchecking comments + using Enchant. */ + +static void +spellcheck_enchant_check_comment (cpp_reader *, source_location loc, + const unsigned char *content, size_t len) +{ + /* Don't spellcheck within system headers. */ + if (in_system_header_at (loc)) + return; + + /* Split into words, tracking the location, and call spellcheck_word + on each word. */ + + /* CONTENT contains the opening slash-star (or slash-slash), + and for C-style comments contains the closing star-slash. */ + gcc_assert (len >= 2); + gcc_assert (content[0] == '/'); + gcc_assert (content[1] == '*' || content[1] == '/'); + bool c_style = (content[1] == '*'); + if (c_style) + { + gcc_assert (content[len - 2] == '*'); + gcc_assert (content[len - 1] == '/'); + } + const unsigned char *iter = content + 2; + const unsigned char *after_last_char = content + len; + if (c_style) + after_last_char -= 2; + const unsigned char *word_start = NULL; + + const struct line_map *map = linemap_lookup (line_table, loc); + const line_map_ordinary *ord_map = linemap_check_ordinary (map); + + int column = LOCATION_COLUMN (loc) + 2; + int line = LOCATION_LINE (loc); + int word_start_column = column; + int word_start_line = line; + + bool within_url = false; + while (iter < after_last_char) + { + unsigned char ch = *iter; + if (ISALPHA (ch) || ch == '_' || ch == '\'') + { + if (!word_start) + { + /* We have the start of a word. */ + word_start = iter; + word_start_column = column; + word_start_line = line; + } + } + else + { + if (word_start) + { + /* We have a word ending. */ + size_t word_len = iter - word_start; + /* Don't check any comments that appear to contain DejaGnu + directives. */ + if (word_len >= 2) + if (0 == strncmp ((const char *)word_start, "dg", 2)) + return; + + /* Detect things that look like URLs. */ + /* "www." */ + if (word_len == 3 && ch == '.') + if (0 == strncmp ((const char *)word_start, "www", 3)) + within_url = true; + + /* "http:" */ + if (word_len == 4 && ch == ':') + if (0 == strncmp ((const char *)word_start, "http", 4)) + within_url = true; + + if (!within_url) + spellcheck_word (word_start, word_len, ord_map, + word_start_line, word_start_column); + word_start = NULL; + } + } + if (ISSPACE (ch)) + within_url = false; + if (ch == '\n') + { + column = 1; + line++; + } + else + column++; + iter++; + } + if (word_start) + if (!within_url) + spellcheck_word (word_start, iter - word_start, ord_map, + word_start_line, word_start_column); +} + +int +plugin_init (struct plugin_name_args *plugin_info, + struct plugin_gcc_version *version) +{ + const char *plugin_name = plugin_info->base_name; + int argc = plugin_info->argc; + struct plugin_argument *argv = plugin_info->argv; + + bool show_provider = false; + const char *language = "en_US"; + + /* Process the plugin arguments. */ + for (int i = 0; i < argc; ++i) + { + if (0 == strcmp (argv[i].key, "show-provider")) + show_provider = true; + else if (0 == strcmp (argv[i].key, "language")) + language = xstrdup (argv[i].value); + else + warning_at (UNKNOWN_LOCATION, 0, + "plugin %qs: unrecognized argument %qs ignored", + plugin_name, argv[i].key); + } + + checker::singleton = new checker (language, show_provider); + + cpp_callbacks *cb = cpp_get_callbacks (parse_in); + cb->comment = spellcheck_enchant_check_comment; + + // TODO: register a cleanup hook, to delete checker::singleton + + return 0; +} diff --git a/gcc/testsuite/lib/plugin-support.exp b/gcc/testsuite/lib/plugin-support.exp index 0754e84..14cc628 100644 --- a/gcc/testsuite/lib/plugin-support.exp +++ b/gcc/testsuite/lib/plugin-support.exp @@ -42,12 +42,30 @@ proc plugin-get-options { src } { unresolved "$src: $errmsg for \"$op\"" return } + } elseif { ![string compare "dg-pkgconfig" $cmd] } { + # Attempt to run "pkg-config --cflags --libs $arg" + # If successful, add stdout to options. + # If a failure occurs (e.g. pkg-config is not installed, + # or $arg is not found), the test case is unsupported; + # generate an "unsupported" error for the caller to catch. + set arg [lindex $op 2] + verbose "dg-pkgconfig: $arg" 2 + #set cmdline "pkg-config --cflags --libs $arg" + if {[catch {exec pkg-config --cflags --libs $arg} result] == 0} { + verbose "pkg-config succeeded: $result" 2 + append dg-extra-tool-flags $result + } else { + verbose "pkg-config failed: $result" 2 + error "unsupported" + } } else { # Ignore unrecognized dg- commands, but warn about them. warning "plugin.exp does not support $cmd" } } + verbose "dg-extra-tool-flags: ${dg-extra-tool-flags}" 2 + # Return flags to use for compiling the plugin source file return ${dg-extra-tool-flags} } @@ -74,7 +92,19 @@ proc plugin-test-execute { plugin_src plugin_tests } { verbose "Test the plugin $testcase" 1 # Build the plugin itself - set extra_flags [plugin-get-options $plugin_src] + + # Attempt to get extra_flags for building the plugin. + if [catch {set extra_flags [plugin-get-options $plugin_src]} reason] { + set saved_info $::errorInfo + if { $reason == "unsupported" } { + # If the plugin is unsupported in this configuration, + # then we're done. + unsupported "$testcase" + return; + } + # Otherwise re-emit the error + error $reason $saved_info + } # Note that the plugin test support currently only works when the GCC # build tree is available. (We make sure that is the case in plugin.exp.) -- 1.8.5.3