Re: parsing a file for analysis

Andrea Crotti Sat, 26 Feb 2011 07:33:16 -0800

Il giorno 26/feb/2011, alle ore 06.45, Rita ha scritto:

> I have a large text (4GB) which I am parsing. 
> 
> I am reading the file to collect stats on certain items.
> 
> My approach has been simple,
> 
> for row in open(file):
>   if "INFO" in row:
>     line=row.split()
>     user=line[0]
>     host=line[1]
>     __time=line[2]
>     ...
> 
> I was wondering if there is a framework or a better algorithm to read such as 
> large file and collect it stats according to content. Also, are there any 
> libraries, data structures or functions which can be helpful? I was told 
> about 'collections' container.  Here are some stats I am trying to get:
> 
> *Number of unique users
> *Break down each user's visit according to time, t0 to t1
> *what user came from what host. 
> *what time had the most users?
> 
> (There are about 15 different things I want to query)
> 
> I understand most of these are redundant but it would be nice to have a 
> framework or even a object oriented way of doing this instead of loading it 
> into a database.  
> 
> 
> Any thoughts or ideas?


Not an expert, but maybe it might be good to push the data into a database, and 
then you can tweak the DBMS and write
smart queries to get all the statistics you want from it.

It might take a while (maybe with regexp splitting is faster) but it's done 
only once and then you work with DB tools.


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing a file for analysis

Reply via email to