En Fri, 13 Feb 2009 08:16:00 -0200, S.Selvam Siva <s.selvams...@gmail.com>
escribió:
I need some help.
I tried to find top n(eg. 5) similar words for a given word, from a
dictionary of 50,000 words.
I used python-levenshtein module,and sample code is as follow.
def foo(searchword):
disdict={}
for word in self.dictionary-words:
distance=Levenshtein.ratio(searchword,word)
disdict[word]=distance
"""
sort the disdict dictionary by values in descending order
"""
similarwords=sorted(disdict, key=disdict.__getitem__, reverse=True)
return similarwords[:5]
You may replace the last steps (sort + slice top 5) by heapq.nlargest - at
least you won't waste time sorting 49995 irrelevant words...
Anyway you should measure the time taken by the first part (Levenshtein),
it may be the most demanding. I think there is a C extension for this,
should be much faster than pure Python calculations.
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list