On Sat, May 30, 2020 at 5:06 PM David Malcolm <dmalc...@redhat.com> wrote:
> On Sat, 2020-05-30 at 13:40 +0000, Pip Cet via Gcc-patches wrote:
> > I think we should just omit the triangle inequality test from the
> > self-test, as in the attached patch.
>
> I like the idea,

Thanks!

> but can you update the comment so that it explains why
> it's not a metric? (better to capture that in a comment in the source
> rather than just in the mailing list archives).

How's this?
From 2a759aba3639dd782c28c727746c0e6913390fa6 Mon Sep 17 00:00:00 2001
From: Pip Cet <pip...@gmail.com>
Date: Sat, 30 May 2020 13:39:09 +0000
Subject: [PATCH] Don't test the triangle inequality in the spellcheck.c
 self-test.

* spellcheck.c (test_data): Add problematic strings.
(test_metric_conditions): Don't test the triangle inequality
condition, which our distance function does not satisfy.
---
 gcc/ChangeLog    |  6 ++++++
 gcc/spellcheck.c | 22 ++++++++--------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4fc37369d39..e565d8ecadf 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2020-05-30  Pip Cet  <pip...@gmail.com>
+
+	* spellcheck.c (test_data): Add problematic strings.
+	(test_metric_conditions): Don't test the triangle inequality
+	condition, which our distance function does not satisfy.
+
 2020-05-28  Nicolas Bértolo  <nicolasbert...@gmail.com>
 
 	* Makefile.in: don't look for libiberty in the "pic" subdirectory
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
index 7891260a258..ab17657e8dc 100644
--- a/gcc/spellcheck.c
+++ b/gcc/spellcheck.c
@@ -438,13 +438,17 @@ static const char * const test_data[] = {
   "food",
   "boo",
   "1234567890123456789012345678901234567890123456789012345678901234567890"
+  "abc",
+  "ac",
+  "ca",
 };
 
 /* Verify that get_edit_distance appears to be a sane distance function,
-   i.e. the conditions for being a metric.  This is done directly for a
-   small set of examples, using test_data above.  This is O(N^3) in the size
-   of the array, due to the test for the triangle inequality, so we keep the
-   array small.  */
+   even though it doesn't satisfy the conditions for being a metric.  (This
+   is because the triangle inequality fails to hold: the distance between
+   "ca" and "ac" is 1, and so is the distance between "abc" and "ac", but
+   the distance between "abc" and "ca" is 3. Algorithms that calculate the
+   true Levenshtein-Damerau metric are much more expensive.)  */
 
 static void
 test_metric_conditions ()
@@ -468,16 +472,6 @@ test_metric_conditions ()
 	  edit_distance_t dist_ji
 	    = get_edit_distance (test_data[j], test_data[i]);
 	  ASSERT_EQ (dist_ij, dist_ji);
-
-	  /* Triangle inequality.  */
-	  for (int k = 0; k < num_test_cases; k++)
-	    {
-	      edit_distance_t dist_ik
-		= get_edit_distance (test_data[i], test_data[k]);
-	      edit_distance_t dist_jk
-		= get_edit_distance (test_data[j], test_data[k]);
-	      ASSERT_TRUE (dist_ik <= dist_ij + dist_jk);
-	    }
 	}
     }
 }
-- 
2.27.0.rc0

Reply via email to