On Fri, Jan 22, 2010 at 08:46:35PM EST, Terry Reedy wrote: > On 1/22/2010 4:47 PM, Chris Jones wrote: >> I was writing a script that counts occurrences of characters in source code >> files: >> >> #!/usr/bin/python >> import codecs >> tcounters = {} >> f = codecs.open('/home/gavron/git/screen/src/screen.c', 'r', "utf-8") >> for uline in f: >> lline = [] >> for char in uline[:-1]: >> lline += [char] > > Same but slower than lline.append(char), however, this loop just > uselessless copies uline[:1]
I'll change that. Do you mean I should just read the file one character at a time? That was my original intention but I didn't find the way to do it. >> counters = {} >> for i in set(lline): >> counters[i] = lline.count(i) > > slow way to do this > >> for c in counters.keys(): >> if c in tcounters: >> tcounters[c] += counters[c] >> else: >> tcounters.update({c: counters[c]}) > > I do not see the reason for intermediate dict Couldn't find a way to increment the counters in the 'grand total' dictionary. I always ended up with the counter values for the last input line :-( Moot point if I can do a for loop reading one character at a time till end of file. >> counters = {} > > duplicate line And totally useless since I never reference it after that. Something I move else where and forgot to delete. Sorry about that. >> for c in tcounters.keys(): >> print c, '\t', tcounters[c] Literals, comments, €'s..? > To only count ascii chars, as should be the case for C code, > > achars = [0]*63 > for c in open('xxx', 'c'): > try: > achars[ord(c)-32] += 1 > except IndexError: > pass > > for i,n in enumerate(achars) > print chr(i), n > > or sum subsets as desired. Thanks much for the snippet, let me play with it and see if I can come up with a Unicode/utf-8 version.. since while I'm at it I might as well write something a bit more general than C code. Since utf-8 is backward-compatible with 7bit ASCII, this shouldn't be a problem. > Terry Jan Reedy Thank you for your comments! CJ -- http://mail.python.org/mailman/listinfo/python-list