But if you just wanna find candidates for spell errors with one char(miss,
add or misspell one char), you

may store several version of the word in the database.

Here is one possible implementation:

For "world", we first add a special char, say '$' to the beginning of the
world, and get "$world".
Then we get $world, d$worl, ld$wor, rld$wo, orld$w, five versions by
rotating the string.
Do the same thing for all words, and store them in sorted array or trie.

For any incorrect spelling, say "wrld", we do conversions as before and get
four versions, say, "$wrld", "d$wrl" "ld$wr" "rld$w".
Search all of them in the database for longest prefix matching.
For example, when searching for "rld$w", we get the longest matching
"rld$wo".
As you can find out, spelling errors within one char ends up with a matching
which only the last char differs.
The complexity is O(L^2 logM) for sorted array or O(L^2) for trie.




2009/9/25 mrkzea <[email protected]>

>
> Hey. I wonder what method would you use to find value in the large
> database assuming that if the word in the database differs from our
> pattern only in one character we treat that as a match.
> Lets say that someone wants to find word "independent" in the database
> but he put "independext" instead. We find "independent" in the
> database and return this word to the user. I wonder how does Google
> solve this problem in its algorithm. If you search for independent,
> google gives you a hint.
>
> Regards
> Mark.
>
> >
>


-- 
     此致
敬礼!

                                                林夏祥

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to