>>>>> "mark" == mark McWilliams <[EMAIL PROTECTED]> writes:

    mark> I need to write a spell checker.  . . . A fresh Idea from
    mark> outside would be very helpful

I can't quite tell what your approach is right now, but it looks too
complicated to me.  In particular, I suggest you consider using a hash
table to represent the dictionary and check if each input word is a
key in the hashtable.  If it isn't, it's a spelling error.  An overly
simplified version of this program would look like this.

        #!/usr/local/bin/perl
        my %dict = ();
        open HASH, "/path/to/dictionary";       # one word per line
        while (<HASH>) {
                chomp;
                $dict{$_} = 1;
        }
        close HASH;

        my %errors = ();
        while (<>) {
                chomp;
                for $w (split $_) {
                        $errors{$w} = 1 unless defined $dict{$w}
                }
        }

        for $w (sort keys %errors) {
                print $w, "\n";
        }

(Note: this code is completely untested.  It hasn't been anywhere near
the perl interpreter.)

This program needs refinement in many ways:

        1. It doesn't show you where the word was misspelled.  This
           behaviour is similar to the classic unix spell
           implementation.

        2. Only exact matches in the dictionary are considered to be
           spelled correctly.  A fancy spell checker might know about
           english suffix and prefix rules, etc.

        3. It doesn't support custom dictionaries and the like.

        4. The use of split to break up words on a line probably isn't
           flexible enough.

        5. It assumes the document to spell check is on standard
           input.

However, even as it stands (modulo any syntax errors) it should be
functional as a simple spell checker.

Dale Hagglund.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to