Thank You Gabriel, On Sun, Jan 25, 2009 at 7:12 AM, Gabriel Genellina <gagsl-...@yahoo.com.ar>wrote:
> En Sat, 24 Jan 2009 15:08:08 -0200, S.Selvam Siva <s.selvams...@gmail.com> > escribió: > > > I am developing spell checker for my local language(tamil) using python. >> I need to generate alternative word list for a miss-spelled word from the >> dictionary of words.The alternatives must be as much as closer to the >> miss-spelled word.As we know, ordinary string comparison wont work here . >> Any suggestion for this problem is welcome. >> > > I think it would better to add Tamil support to some existing library like > GNU aspell: http://aspell.net/ That was my plan earlier,But i am not sure how aspell integrates with other editors.Better i will ask it in aspell mailing list. > You are looking for "fuzzy matching": > http://en.wikipedia.org/wiki/Fuzzy_string_searching > In particular, the Levenshtein distance is widely used; I think there is a > Python extension providing those calculations. > > -- > Gabriel Genellina The following code served my purpose,(thanks for some unknown contributors) def distance(a,b): c = {} n = len(a); m = len(b) for i in range(0,n+1): c[i,0] = i for j in range(0,m+1): c[0,j] = j for i in range(1,n+1): for j in range(1,m+1): x = c[i-1,j]+1 y = c[i,j-1]+1 if a[i-1] == b[j-1]: z = c[i-1,j-1] else: z = c[i-1,j-1]+1 c[i,j] = min(x,y,z) return c[n,m] a=sys.argv[1] b=sys.argv[2] d=distance(a,b) print "d=",d longer = float(max((len(a), len(b)))) shorter = float(min((len(a), len(b)))) r = ((longer - d) / longer) * (shorter / longer) # r ranges between 0 and 1 -- Yours, S.Selvam
-- http://mail.python.org/mailman/listinfo/python-list