On May 9, 11:19 pm, dave <[EMAIL PROTECTED]> wrote:
> On 2008-05-09 18:53:19 -0600, George Sakkis <[EMAIL PROTECTED]> said:
>
>
>
> > On May 9, 5:19 pm, [EMAIL PROTECTED] wrote:
> >>>> What would be the best method to print the top results, the one's that
>
> >>>> had the highest amount of anagrams??  Create a new histogram dict?
>
> >>> You can use the max() function to find the biggest list of anagrams:
>
> >>> top_results = max(anagrams.itervalues(), key=len)
>
> >>> --
> >>> Arnaud
>
> >> That is the biggest list of anagrams, what if I wanted the 3 biggest
> >> lists?  Is there a way to specific top three w/ a max command??
>
> >>>> import heapq
> >>>> help(heapq.nlargest)
> > Help on function nlargest in module heapq:
>
> > nlargest(n, iterable, key=None)
> >     Find the n largest elements in a dataset.
>
> >     Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]
>
> > HTH,
> > George
>
> I both the 'nlargest' and the 'sorted' methods.. I could only get the
> sorted to work.. however it would only return values like (3,  edam)
> not the list of words..
>
> Here is what I have now.. Am I over-analyzing this?  It doesn't seem
> like it should be this hard to get the program to print the largest set
> of anagrams first...
>
> def anafind():
>         fin = open('text.txt')
>         mapdic = {}
>         finalres = []                   # top answers go here
>         for line in fin:
>                 line = line.strip()
>                 alphaword = ''.join(sorted(line))       #sorted word as key
>                 if alphaword not in mapdic:
>                         mapdic[alphaword] = [line]
>                 else:
>                         mapdic[alphaword].append(line)
>         topans = sorted((len(mapdic[key]), key) for key in mapdic.keys())[-3:]
>   #top 3 answers
>         for key, value in topans:       #
>                 finalres.append(mapdic[value])
>         print finalres

Here is a working, cleaned up version:

from heapq import nlargest
from collections import defaultdict

def anagrams(words, top=None):
    key2words = defaultdict(set)
    for word in words:
        word = word.strip()
        key = ''.join(sorted(word))
        key2words[key].add(word)
    if top is None:
        return sorted(key2words.itervalues(), key=len, reverse=True)
    else:
        return nlargest(top, key2words.itervalues(), key=len)

if __name__ == '__main__':
    wordlist = ['live', 'evil', 'one', 'nose', 'vile', 'neo']
    for words in anagrams(wordlist,2):
        print words


By the way, try to generalize your functions (and rest of the code for
that matter) so that it can be reused easily. For example, hardcoding
the input file name in the function's body is usually undesirable.
Similarly for other constants like 'get top 3 answers'; it doesn't
cost you anything to replace 3 with 'top' and pass it as an argument
(or set it as default top=3 if that's the default case).

HTH,
George
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to