The defaultdict option looks faster than the standard dict (20 secs aprox).
Now i have:
#
import fileinput
import sys
from collections import defaultdict
match_counter = defaultdict(int)
for line in fileinput.input(sys.argv[1:]):
match_counter[line.split()[0]] +=
2008/12/16
> Python 3.0 does not support has_key, it's time to get used to not using it
> :)
>
Good to know
line.split(None, 1)[0] really speeds up the proccess
Thanks again.
--
http://mail.python.org/mailman/listinfo/python-list
Hi all,
Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.
The first design of the algorithm was
for line in fileinput.input(sys.argv[1:]):
ip = line.split()[0]
if match_counter.has_key(ip):
match_counter[ip] += 1
else:
Great, 2min 34 secs with the open method =)
but why?
ip, sep, rest = line.partition(' ')
match_counter[ip] += 1
instead of
match_counter[line.strip()[0]] += 1
strip really takes more time than partition?
I'm having the same results with both of them right now.
--
http://mail.python.org
Yep i meant split sorry.
Thanks for the answer!
--
http://mail.python.org/mailman/listinfo/python-list
Wow, thanks again =)
--
http://mail.python.org/mailman/listinfo/python-list
You can try also web2py (http://mdp.cti.depaul.edu/) but i think you may be
interested on http://www.modpython.org/
--
http://mail.python.org/mailman/listinfo/python-list