On 1/22/2010 4:47 PM, Chris Jones wrote:
I was writing a script that counts occurrences of characters in source code
files:
#!/usr/bin/python
import codecs
tcounters = {}
f = codecs.open('/home/gavron/git/screen/src/screen.c', 'r', "utf-8")
for uline in f:
lline = []
for char in uline[:-1]:
lline += [char]
Same but slower than lline.append(char), however, this loop just
uselessless copies uline[:1]
counters = {}
for i in set(lline):
counters[i] = lline.count(i)
slow way to do this
for c in counters.keys():
if c in tcounters:
tcounters[c] += counters[c]
else:
tcounters.update({c: counters[c]})
I do not see the reason for intermediate dict
counters = {}
duplicate line
for c in tcounters.keys():
print c, '\t', tcounters[c]
To only count ascii chars, as should be the case for C code,
achars = [0]*63
for c in open('xxx', 'c'):
try:
achars[ord(c)-32] += 1
except IndexError:
pass
for i,n in enumerate(achars)
print chr(i), n
or sum subsets as desired.
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list