On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote:
[...]
> diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
> index 405bae0..72ed311 100644
> --- a/gcc/fortran/misc.c
> +++ b/gcc/fortran/misc.c
[...]
> @@ -274,3 +275,41 @@ get_c_kind(const char *c_kind_name,teropKind_tki
> nds_table[])
>
> return ISOCBINDING_INVALID;
> }
> +
> +
> +/* For a given name TYPO, determine the best candidate from
> CANDIDATES
> + perusing Levenshtein distance. Frees CANDIDATES before
> returning. */
> +
> +const char *
> +gfc_closest_fuzzy_match (const char *typo, char **candidates)
> +{
> + /* Determine closest match. */
> + const char *best = NULL;
> + char **cand = candidates;
> + edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +
> + while (cand && *cand)
> + {
> + edit_distance_t dist = levenshtein_distance (typo, *cand);
> + if (dist < best_distance)
> + {
> + best_distance = dist;
> + best = *cand;
> + }
> + cand++;
> + }
> + /* If more than half of the letters were misspelled, the
> suggestion is
> + likely to be meaningless. */
> + if (best)
> + {
> + unsigned int cutoff = MAX (strlen (typo), strlen (best)) / 2;
> +
> + if (best_distance > cutoff)
> + {
> + XDELETEVEC (candidates);
> + return NULL;
> + }
> + XDELETEVEC (candidates);
> + }
> + return best;
> +}
FWIW, there are two overloaded variants of levenshtein_distance in
gcc/spellcheck.h, the first of which takes a pair of strlen values;
your patch uses the second one:
extern edit_distance_t
levenshtein_distance (const char *s, int len_s,
const char *t, int len_t);
extern edit_distance_t
levenshtein_distance (const char *s, const char *t);
So one minor tweak you may want to consider here is to calculate
strlen (typo)
once at the top of gfc_closest_fuzzy_match, and then pass it in to the
4-arg variant of levenshtein_distance, which would avoid recalculating
strlen (typo) for every candidate.
I can't comment on the rest of the patch (I'm not a Fortran expert),
though it seems sane to
Hope this is constructive
Dave