On Tue, 2017-05-02 at 14:11 -0700, Mike Stump wrote:
> On May 2, 2017, at 12:08 PM, David Malcolm <dmalc...@redhat.com>
> wrote:
> > 
> > As a proof of concept, the patch uses this to add a spellchecker
> > for comments.
> 
> :-)
> 
> > I didn't bother with the autotool detection for enchant, and
> > just hacked it in for now.
> 
> Only other comment would be, rather then requiring it, would be nice
> to make it optional.  I can host gcc in an environment that is a bare
> metal target on newlib.  autoconf can smell it, and tell that a ton
> of things are missing.

Given that Enchant seems a stretch as a hard dependency, it seems
better to either:
(a) have this code live as a plugin, or
(b) attempt to dynamically load libenchant on-demand (a "soft"
dependency ?)

This patch implements (a): the plugin approach.

We don't currently install any "official" plugins; ideally, if a
plugin, I'd like it to be one of the things that "make install"
installs: it add dependencies that are too much to impose on everyone's
cc1 et al (hence a plugin), but which a significant number of people
might want to opt in to (hence in-tree and "supported", for some meaning
of that term).

We don't have any mechanism or policies to allow that yet, so in the
meantime, here's a revised version of the spellchecker code which adds
it to the *testsuite*, as an example plugin.  (specifically the C++
testsuite, to support both C and C++ style comments without needing
to silence the compat flag).

Changes:
  * moved it all to a plugin, within the testsuite
  * added test coverage
  * added option -fplugin-arg-spellcheck-show-provider
  * added option -fplugin-arg-spellcheck-language
  * fixed bug with misspelling at very end of a comment
  * fixed bug with apostrophes breaking words
  * fixed quoting of suggestions
  * don't spellcheck within system headers
  * don't spellcheck DejaGnu directives
  * don't spellcheck URLs
  * removed the -Wfixme and -Wtodo idea

libenchant uses pkg-config to expose the flags needed to build
against it, so the patch adds a new "dg-pkgconfig" directive to
plugin-support.exp.  This attempts to use pkg-config to locate
the appropriate flags for building against libenchant.
If it fails, the test is marked as unsupported.

Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/testsuite/ChangeLog:
        * g++.dg/plugin/plugin.exp (plugin_test_list): Add the
        following...
        * g++.dg/plugin/spellcheck-1.C: New test case.
        * g++.dg/plugin/spellcheck-2.C: New test case.
        * g++.dg/plugin/spellcheck-3.C: New test case.
        * g++.dg/plugin/spellcheck-4.C: New test case.
        * g++.dg/plugin/spellcheck-5.C: New test case.
        * g++.dg/plugin/spellcheck-6.C: New test case.
        * g++.dg/plugin/spellcheck.c: New test plugin.
        * lib/plugin-support.exp (plugin-get-options): Add handling
        for "dg-pkgconfig" to the special-cased directive handling.
        Verbosely log the resulting value of dg-extra-tool-flags.
        (plugin-test-execute): Handle "unsupported" errors within
        plugin-get-options by marking the testcase as unsupported and
        immediately returning.
---
 gcc/testsuite/g++.dg/plugin/plugin.exp     |   7 +
 gcc/testsuite/g++.dg/plugin/spellcheck-1.C |  14 ++
 gcc/testsuite/g++.dg/plugin/spellcheck-2.C |  13 ++
 gcc/testsuite/g++.dg/plugin/spellcheck-3.C |   2 +
 gcc/testsuite/g++.dg/plugin/spellcheck-4.C |  19 ++
 gcc/testsuite/g++.dg/plugin/spellcheck-5.C |   4 +
 gcc/testsuite/g++.dg/plugin/spellcheck-6.C |   5 +
 gcc/testsuite/g++.dg/plugin/spellcheck.c   | 361 +++++++++++++++++++++++++++++
 gcc/testsuite/lib/plugin-support.exp       |  32 ++-
 9 files changed, 456 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-1.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-2.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-3.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-4.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-5.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-6.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck.c

diff --git a/gcc/testsuite/g++.dg/plugin/plugin.exp 
b/gcc/testsuite/g++.dg/plugin/plugin.exp
index e40cba3..e3f030f 100644
--- a/gcc/testsuite/g++.dg/plugin/plugin.exp
+++ b/gcc/testsuite/g++.dg/plugin/plugin.exp
@@ -69,6 +69,13 @@ set plugin_test_list [list \
          show-template-tree-color.C \
          show-template-tree-color-no-elide-type.C } \
     { comment_plugin.c comments-1.C } \
+    { spellcheck.c \
+         spellcheck-1.C \
+         spellcheck-2.C \
+         spellcheck-3.C \
+         spellcheck-4.C \
+         spellcheck-5.C \
+         spellcheck-6.C } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-1.C 
b/gcc/testsuite/g++.dg/plugin/spellcheck-1.C
new file mode 100644
index 0000000..242ab13
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-1.C
@@ -0,0 +1,14 @@
+/* When NONCONST_PRED is false the code will evaulate to constant.  */
+/* { dg-warning "46: possible spelling mistake in comment: 'evaulate'" "" { 
target *-*-* } .-1 } */
+
+// Example of a boogus C++-style comment
+/* { dg-warning "17: possible spelling mistake in comment: 'boogus'" "" { 
target *-*-* } .-1 } */
+
+/* Exercise the code path for the final word in a commment*/
+/* { dg-warning "51: possible spelling mistake in comment: 'commment'" "" { 
target *-*-* } .-1 } */
+
+// Exercise the code path for the final word in a C++ commment
+/* { dg-warning "55: possible spelling mistake in comment: 'commment'" "" { 
target *-*-* } .-1 } */
+
+/* Examples of words that are correctly spelled, containing quotes:
+   don't doesn't is isn't.  */
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-2.C 
b/gcc/testsuite/g++.dg/plugin/spellcheck-2.C
new file mode 100644
index 0000000..8f28f62
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-2.C
@@ -0,0 +1,13 @@
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+/* Verify that misspelled words are underlined correctly.  We need
+   a word that the checker won't offer a suggestion for, to avoid
+   having to specify checker output within the test case.
+   Hence the use of splendiferouslitudinalistic here.  */
+#if 0
+{ dg-warning "possible spelling mistake in comment: 
'splendiferouslitudinalistic'" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+    Hence the use of splendiferouslitudinalistic here.
+                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
+{ dg-end-multiline-output "" }
+#endif
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-3.C 
b/gcc/testsuite/g++.dg/plugin/spellcheck-3.C
new file mode 100644
index 0000000..c497988
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-3.C
@@ -0,0 +1,2 @@
+// { dg-options "-fplugin-arg-spellcheck-show-provider" }
+// { dg-message "spellcheck provider: '.*'  description: '.*'  file: '.*'" "" 
{ target *-*-* } 0 }
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-4.C 
b/gcc/testsuite/g++.dg/plugin/spellcheck-4.C
new file mode 100644
index 0000000..2038f10
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-4.C
@@ -0,0 +1,19 @@
+/* Verify that we don't complain about our legal boilerplate.  */
+
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-5.C 
b/gcc/testsuite/g++.dg/plugin/spellcheck-5.C
new file mode 100644
index 0000000..c7f4714
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-5.C
@@ -0,0 +1,4 @@
+// { dg-options "-fplugin-arg-spellcheck-language=this_is_not_valid" }
+// { dg-error "unknown language tag: 'this_is_not_valid'" "" { target *-*-* } 
0 }
+
+/* A test comment.  */
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-6.C 
b/gcc/testsuite/g++.dg/plugin/spellcheck-6.C
new file mode 100644
index 0000000..5a95881
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-6.C
@@ -0,0 +1,5 @@
+// Example of specifying language
+// { dg-options "-fplugin-arg-spellcheck-language=en_GB" }
+// Is it color or colour?
+// { dg-warning "possible spelling mistake in comment: 'color'" "" { target 
*-*-* } .-1 }
+// { dg-message "suggestion: 'colour'" "" { target *-*-* } .-2 }
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck.c 
b/gcc/testsuite/g++.dg/plugin/spellcheck.c
new file mode 100644
index 0000000..6851637
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck.c
@@ -0,0 +1,361 @@
+/* Spellchecking using the Enchant "meta-library".
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This plugin links against libenchant.
+   Use pkg-config to locate compilation and linkage flags for
+   libenchant, treating the testcase as unsupported if pkg-config or
+   enchant are not available.  */
+
+/* { dg-pkgconfig "enchant" } */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "options.h"
+#include "c-family/c-pragma.h"
+
+/* The dg-pkgconfig above should have provided the necessary
+   -I flags to locate enchant.h (e.g. "-I/usr/include/enchant").  */
+
+#include <enchant.h>
+
+/* Enchant is LGPL 2.1 "or (at your option) any later version." with
+   a special exception to allow non-LGPL spellchecking providers.  */
+
+int plugin_is_GPL_compatible;
+
+/* Class wrapping the result of enchant_dict_suggest, ensuring that
+   the suggestions are cleaned up.  */
+
+class suggestions
+{
+ public:
+  suggestions (EnchantDict *dict, const char *const word, ssize_t len)
+    : m_dict (dict), m_suggestions (NULL)
+    {
+      m_suggestions = enchant_dict_suggest (m_dict, word, len, &m_num);
+    }
+  ~suggestions ()
+    {
+      enchant_dict_free_string_list (m_dict, m_suggestions);
+    }
+
+  size_t get_count () const { return m_num; }
+  const char *get_suggestion (size_t index) const
+  {
+    return m_suggestions[index];
+  }
+
+ private:
+  EnchantDict *m_dict;
+  size_t m_num;
+  char **m_suggestions;
+};
+
+/* Callback for describing metadata about the Enchant provider in use.  */
+
+static void
+describe_broker_cb (const char * const provider_name,
+                   const char * const provider_desc,
+                   const char * const provider_dll_file,
+                   void *)
+{
+  inform (UNKNOWN_LOCATION,
+         "spellcheck provider: %qs  description: %qs  file: %qs",
+         provider_name, provider_desc, provider_dll_file);
+}
+
+/* A class for wrapping access to the Enchant API (both the broker
+   and the dict).  */
+
+class checker
+{
+ public:
+  checker (const char *language_tag, bool show_provider);
+  ~checker ();
+
+  bool misspelled_p (const char *const word, ssize_t len);
+  suggestions get_suggestions (const char *const word, ssize_t len);
+
+  static checker *singleton;
+
+ private:
+  EnchantBroker *m_broker;
+  EnchantDict *m_dict;
+};
+
+/* Callback for describing the available languages.  */
+
+static void
+describe_dict_cb (const char * const lang_tag,
+                 const char * const provider_name,
+                 const char * const provider_desc,
+                 const char * const provider_file,
+                 void *)
+{
+  inform (UNKNOWN_LOCATION,
+         "available language: %qs"
+         " (provider: %qs  description: %qs  file: %qs)",
+         lang_tag, provider_name, provider_desc, provider_file);
+}
+
+/* checker's ctor.  */
+
+checker::checker (const char *language_tag, bool show_provider)
+: m_broker (NULL), m_dict (NULL)
+{
+  m_broker = enchant_broker_init ();
+  if (!m_broker)
+    {
+      error_at (UNKNOWN_LOCATION, "enchant_broker_init failed");
+      return;
+    }
+  m_dict = enchant_broker_request_dict (m_broker, language_tag);
+  if (!m_dict)
+    {
+      error_at (UNKNOWN_LOCATION, "unknown language tag: %qs", language_tag);
+      enchant_broker_list_dicts (m_broker, describe_dict_cb, NULL);
+    }
+
+  if (show_provider)
+    enchant_broker_describe (m_broker, describe_broker_cb, NULL);
+
+  // TODO: will probably want to support local wordlists also?
+}
+
+/* checker's dtor.  */
+
+checker::~checker ()
+{
+  if (m_dict)
+    enchant_broker_free_dict (m_broker, m_dict);
+  if (m_broker)
+    enchant_broker_free (m_broker);
+}
+
+/* Return true if WORD of length LEN appears to be misspelled.  */
+
+bool
+checker::misspelled_p (const char *const word, ssize_t len)
+{
+  if (!m_dict)
+    return false;
+
+  int result = enchant_dict_check (m_dict, word, len);
+  return result != 0;
+}
+
+/* Get suggestions for misspelled WORD of length LEN.  */
+
+suggestions
+checker::get_suggestions (const char *const word, ssize_t len)
+{
+  gcc_assert (m_dict);
+  return suggestions (m_dict, word, len);
+}
+
+/* Singleton instance of checker.  */
+
+checker *checker::singleton = NULL;
+
+/* Generate a location for a word of length LEN within ORD_MAP at the
+   given LINE AND COLUMN.  */
+
+static location_t
+make_word_location (const line_map_ordinary *ord_map, int line, int column,
+                   size_t len)
+{
+  location_t start_loc
+    = linemap_position_for_line_and_column (line_table, ord_map, line, column);
+  location_t end_loc
+    = linemap_position_for_line_and_column (line_table, ord_map, line,
+                                           column + len - 1);
+  return make_location (start_loc, start_loc, end_loc);
+}
+
+/* Spellcheck the word at WORD of length LEN, with a location
+   within ORD_MAP at the given LINE AND COLUMN.
+   Issue warnings and suggestions for possibly misspelled words.  */
+
+static void
+spellcheck_word (const unsigned char *word, size_t len,
+                const line_map_ordinary *ord_map, int line, int column)
+{
+  /* Don't bother spellchecking words with underscores or any upper case,
+     as these may refer to code entities.
+     TODO: should we also merge in names from code?  */
+  for (size_t i = 0; i < len; i++)
+    {
+      if (word[i] == '_' || ISUPPER (word[i]))
+       return;
+    }
+
+  if (!checker::singleton->misspelled_p ((const char *)word, len))
+    return;
+
+  location_t word_loc = make_word_location (ord_map, line, column, len);
+
+  if (warning_at (word_loc, 0, "possible spelling mistake in comment: %q.*s", 
(int)len, word))
+    {
+      suggestions s
+       = checker::singleton->get_suggestions ((const char *)word, len);
+      for (size_t i = 0; i < s.get_count (); i++)
+       {
+         rich_location richloc (line_table, word_loc);
+         richloc.add_fixit_replace (s.get_suggestion (i));
+         inform_at_rich_loc (&richloc, "suggestion: %qs",
+                             s.get_suggestion (i));
+       }
+    }
+}
+
+/* Callback for cpp_callbacks::comments for spellchecking comments
+   using Enchant.  */
+
+static void
+spellcheck_enchant_check_comment (cpp_reader *, source_location loc,
+                                 const unsigned char *content, size_t len)
+{
+  /* Don't spellcheck within system headers.  */
+  if (in_system_header_at (loc))
+    return;
+
+  /* Split into words, tracking the location, and call spellcheck_word
+     on each word.  */
+
+  /* CONTENT contains the opening slash-star (or slash-slash),
+     and for C-style comments contains the closing star-slash.  */
+  gcc_assert (len >= 2);
+  gcc_assert (content[0] == '/');
+  gcc_assert (content[1] == '*' || content[1] == '/');
+  bool c_style = (content[1] == '*');
+  if (c_style)
+    {
+      gcc_assert (content[len - 2] == '*');
+      gcc_assert (content[len - 1] == '/');
+    }
+  const unsigned char *iter = content + 2;
+  const unsigned char *after_last_char = content + len;
+  if (c_style)
+    after_last_char -= 2;
+  const unsigned char *word_start = NULL;
+
+  const struct line_map *map = linemap_lookup (line_table, loc);
+  const line_map_ordinary *ord_map = linemap_check_ordinary (map);
+
+  int column = LOCATION_COLUMN (loc) + 2;
+  int line = LOCATION_LINE (loc);
+  int word_start_column = column;
+  int word_start_line = line;
+
+  bool within_url = false;
+  while (iter < after_last_char)
+    {
+      unsigned char ch = *iter;
+      if (ISALPHA (ch) || ch == '_' || ch == '\'')
+       {
+         if (!word_start)
+           {
+             /* We have the start of a word.  */
+             word_start = iter;
+             word_start_column = column;
+             word_start_line = line;
+           }
+       }
+      else
+       {
+         if (word_start)
+           {
+             /* We have a word ending.  */
+             size_t word_len = iter - word_start;
+             /* Don't check any comments that appear to contain DejaGnu
+                directives.  */
+             if (word_len >= 2)
+               if (0 == strncmp ((const char *)word_start, "dg", 2))
+                 return;
+
+             /* Detect things that look like URLs.  */
+             /* "www."  */
+             if (word_len == 3 && ch == '.')
+               if (0 == strncmp ((const char *)word_start, "www", 3))
+                 within_url = true;
+
+             /* "http:"  */
+             if (word_len == 4 && ch == ':')
+               if (0 == strncmp ((const char *)word_start, "http", 4))
+                 within_url = true;
+
+             if (!within_url)
+               spellcheck_word (word_start, word_len, ord_map,
+                                word_start_line, word_start_column);
+             word_start = NULL;
+           }
+       }
+      if (ISSPACE (ch))
+       within_url = false;
+      if (ch == '\n')
+       {
+         column = 1;
+         line++;
+       }
+      else
+       column++;
+      iter++;
+    }
+  if (word_start)
+    if (!within_url)
+      spellcheck_word (word_start, iter - word_start, ord_map,
+                      word_start_line, word_start_column);
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+            struct plugin_gcc_version *version)
+{
+  const char *plugin_name = plugin_info->base_name;
+  int argc = plugin_info->argc;
+  struct plugin_argument *argv = plugin_info->argv;
+
+  bool show_provider = false;
+  const char *language = "en_US";
+
+  /* Process the plugin arguments.  */
+  for (int i = 0; i < argc; ++i)
+    {
+      if (0 == strcmp (argv[i].key, "show-provider"))
+       show_provider = true;
+      else if (0 == strcmp (argv[i].key, "language"))
+       language = xstrdup (argv[i].value);
+      else
+       warning_at (UNKNOWN_LOCATION, 0,
+                   "plugin %qs: unrecognized argument %qs ignored",
+                   plugin_name, argv[i].key);
+    }
+
+  checker::singleton = new checker (language, show_provider);
+
+  cpp_callbacks *cb = cpp_get_callbacks (parse_in);
+  cb->comment = spellcheck_enchant_check_comment;
+
+  // TODO: register a cleanup hook, to delete checker::singleton
+
+  return 0;
+}
diff --git a/gcc/testsuite/lib/plugin-support.exp 
b/gcc/testsuite/lib/plugin-support.exp
index 0754e84..14cc628 100644
--- a/gcc/testsuite/lib/plugin-support.exp
+++ b/gcc/testsuite/lib/plugin-support.exp
@@ -42,12 +42,30 @@ proc plugin-get-options { src } {
                unresolved "$src: $errmsg for \"$op\""
                return
            }
+       } elseif { ![string compare "dg-pkgconfig" $cmd] } {
+           # Attempt to run "pkg-config --cflags --libs $arg"
+           # If successful, add stdout to options.
+           # If a failure occurs (e.g. pkg-config is not installed,
+           # or $arg is not found), the test case is unsupported;
+           # generate an "unsupported" error for the caller to catch.
+           set arg [lindex $op 2]
+           verbose "dg-pkgconfig: $arg" 2
+           #set cmdline "pkg-config --cflags --libs $arg"
+           if {[catch {exec pkg-config --cflags --libs $arg} result] == 0} { 
+               verbose "pkg-config succeeded: $result" 2
+               append dg-extra-tool-flags $result
+           } else { 
+               verbose "pkg-config failed: $result" 2
+               error "unsupported"
+           } 
        } else {
            # Ignore unrecognized dg- commands, but warn about them.
            warning "plugin.exp does not support $cmd"
        }
    }
 
+    verbose "dg-extra-tool-flags: ${dg-extra-tool-flags}" 2
+
     # Return flags to use for compiling the plugin source file
     return ${dg-extra-tool-flags}
 }
@@ -74,7 +92,19 @@ proc plugin-test-execute { plugin_src plugin_tests } {
     verbose "Test the plugin $testcase" 1
 
     # Build the plugin itself
-    set extra_flags [plugin-get-options $plugin_src]
+
+    # Attempt to get extra_flags for building the plugin.
+    if [catch {set extra_flags [plugin-get-options $plugin_src]} reason] {
+       set saved_info $::errorInfo
+       if { $reason == "unsupported" } {
+           # If the plugin is unsupported in this configuration,
+           # then we're done.
+           unsupported "$testcase"
+           return;
+       }
+       # Otherwise re-emit the error
+       error $reason $saved_info
+    }
 
     # Note that the plugin test support currently only works when the GCC
     # build tree is available. (We make sure that is the case in plugin.exp.)
-- 
1.8.5.3

Reply via email to