<[EMAIL PROTECTED]> wrote: >Oh sorry indentation was messed here...the >wordlist = countDict.keys() >wordlist.sort() >should be outside the word loop.... now >def create_words(lines): > cnt = 0 > spl_set = '[",;<>{}_&?!():-[\.=+*\t\n\r]+' > for content in lines: > words=content.split() > countDict={} > wordlist = [] > for w in words: > w=string.lower(w) > if w[-1] in spl_set: w = w[:-1] > if w != '': > if countDict.has_key(w): > countDict[w]=countDict[w]+1 > else: > countDict[w]=1 > wordlist = countDict.keys() > wordlist.sort() > cnt += 1 > if countDict != {}: > for word in wordlist: print (word+' '+ >str(countDict[word])+'\n') > >ok now this is the correct question I am asking...
(a) You might be better off doing: words = words.lower() for w in words: ... instead of calling lower() on each separate word (and note that most functions from string are deprecated in favour of string methods). (b) spl_set isn't doing what you might think it is -- it looks like you've written it as a regexp but your using it as a character set. What you might want is: spl_set = '",;<>{}_&?!():-[\.=+*\t\n\r' and while w[-1] in spl_set: w = w[:-1] That loop can be written: w = w.rstrip(spl_set) (which by my timings is faster if you have multiple characters from spl_set at the end of your word, but slower if you have 0 or 1). -- \S -- [EMAIL PROTECTED] -- http://www.chaos.org.uk/~sion/ ___ | "Frankly I have no feelings towards penguins one way or the other" \X/ | -- Arthur C. Clarke her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
-- http://mail.python.org/mailman/listinfo/python-list