Am 04.10.2014 um 11:11 schrieb Shiva: > Hi All, > > I have written a function that > -reads a file > -splits the words and stores it in a dictionary as word(key) and the total > count of word in file (value). > > I want to print the words with top 20 occurrences in the file in reverse > order - but can't figure it out.
As python is a high-level language with a comprehensive standard library, there ought to be a built-in method to sort stuff... Wait, but there ist: sorted(iterable[, key][, reverse]) The tricky part is: a dictionary is not storing the order of its entries. But you could just go with the list of your keys, sorted by the count of your words and then go from there. > Here is my function: > > def print_top(filename): > > #Open a file > path = '/home/BCA/Documents/LearnPython/ic/' > fname = path + filename > print ('filename: ',fname) > filetext = open(fname) > > #Read the file > textstorage={} > > #print(type(textstorage)) > readall = filetext.read().lower() > eachword = set(readall.split()) > > #store split words as keys in dictionary > for w in eachword: > textstorage[w] = readall.count(w) > > #print top 20 items in dictionary by decending order of val > # This bit is what I can't figure out. orderedwords = sorted(textstorage.keys(), key=textstorage.get, reverse=True) toptwenty = orderedwords[:20] for dkey in toptwenty: print(dkey,textstorage[dkey]) The interesting part is the "key=textstorage.get"-portion. This defines according to what "key" two values of the iterable (the keys of the dict) should be compared. The method textstorage.get will accept a word and return the count of that word. This count is used to sort the words - in reverse order. Another remark: Your algorithm works, but imagine this: Your text could be more than a few lines long and contain more than a few different words. When you convert your text into the set of alle the words you have to walk through it once. You only type one line into python, but this is what python has to do at that point. Afterwards, for every single word, you use the count-method of the complete text. This also walks through the whole text - each time. So if your text has N different words you walk through it N+2 times (your readall line also walks completely through it!). It is possible to solve your problem while only walking through the text only a few (independent of N) times or even only a single time! And the larger your text gets, the more important this gets. This is not an issue in python but in all of computer science. So try and rethink your algorithm with that goal in mind: Only walk through the text once. Python makes strong use on iterators, also file objects can be used as iterators. The count in your dictionary can be updated while you walk through the text. The get-method has a keyword argument for creating default values, which might be useful here. Another thing worth of mentioning is, that python has exactly this kind of machinery already built-in (collections.Counter), but you should try and implement a simple version of it yourself as exercise. Alexander -- https://mail.python.org/mailman/listinfo/python-list