Il giorno 26/feb/2011, alle ore 06.45, Rita ha scritto: > I have a large text (4GB) which I am parsing. > > I am reading the file to collect stats on certain items. > > My approach has been simple, > > for row in open(file): > if "INFO" in row: > line=row.split() > user=line[0] > host=line[1] > __time=line[2] > ... > > I was wondering if there is a framework or a better algorithm to read such as > large file and collect it stats according to content. Also, are there any > libraries, data structures or functions which can be helpful? I was told > about 'collections' container. Here are some stats I am trying to get: > > *Number of unique users > *Break down each user's visit according to time, t0 to t1 > *what user came from what host. > *what time had the most users? > > (There are about 15 different things I want to query) > > I understand most of these are redundant but it would be nice to have a > framework or even a object oriented way of doing this instead of loading it > into a database. > > > Any thoughts or ideas?
Not an expert, but maybe it might be good to push the data into a database, and then you can tweak the DBMS and write smart queries to get all the statistics you want from it. It might take a while (maybe with regexp splitting is faster) but it's done only once and then you work with DB tools. -- http://mail.python.org/mailman/listinfo/python-list