anish singh wrote: >>> However, I am stuck. I have below code which is not working. > > I don't know how to achieve this programmatically: sorted by the > number of occurrences in a descending order. If two or more words > have the same count, they should be sorted > alphabetically (in an ascending order).
>>> document = "Practice makes perfect, you'll get perfecT by practice. just practice! just just just!!" >>> words = ("".join(c for c in word if "a" <= c <= "z") for word in document.lower().split()) >>> freq = collections.Counter(words) >>> freq Counter({'just': 4, 'practice': 3, 'perfect': 2, 'by': 1, 'get': 1, 'makes': 1, 'youll': 1}) Given that Counter or a similar dict you can first sort by word and then by word frequency: >>> pairs = sorted(freq.items()) # sort alphabetically >>> pairs.sort(key=lambda pair: pair[1], reverse=True) # sort by frequency >>> pairs [('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1), ('makes', 1), ('youll', 1)] This works because Python's sorting algorithm is "stable", i. e. values with the same key stay in the same relative order as before the sorting. While you can also achieve that with a single sorted() call >>> sorted(freq.items(), key=lambda p: (-p[1], p[0])) [('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1), ('makes', 1), ('youll', 1)] the first method is usually clearer. PS: Both approaches also work with comparison functions, e. g. >>> def cmp_freqs((w1, f1), (w2, f2)): ... return -cmp(f1, f2) or cmp(w1, w2) ... >>> sorted(freqs.iteritems(), cmp_freqs) [('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1), ('makes', 1), ('youll', 1)] but this is (1) usually less efficient (2) limited to Python 2 so I can't recommend the cmp-based solution. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor