Thanks for the sugestions Peter, I will give them a try
Peter Otten wrote: > J wrote: > > > Hello Peter, Angelico, > > > > Ok lets see, My aim is to filter out several fields from a log file and > > write them to a new log file. The current log file, as I mentioned > > previously, has thousands of lines like this:- 2011-05-16 09:46:22,361 > > [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_ > > CC_SMS_SERVICE_51408_656-ServerThread- > VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX > > - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004 > > Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) > > > > > All the lines in the log file are similar and they all have the same > > length (same amount of fields). Most of the fields are separated by > > spaces except for couple of them which I am processing with AWK (removing > > "<G_" from the string for example). So in essence what I want to do is > > evaluate each line in the log file and break them down into fields which I > > can call individually and write them to a new log file (for example > > selecting only fields 1, 2 and 3). > > > > I hope this is clearer now > > Not much :( > > It doesn't really matter whether there are 100, 1000, or a million lines in > the file; the important information is the structure of the file. You may be > able to get away with a quick and dirty script consisting of just a few > regular expressions, e. g. > > import re > > filename = ... > > def get_service(line): > return re.compile(r"[(](\w+)").search(line).group(1) > > def get_command(line): > return re.compile(r"<G_(\w+)").search(line).group(1) > > def get_status(line): > return re.compile(r"Status:\s+(\d+)").search(line).group(1) > > with open(filename) as infile: > for line in infile: > print get_service(line), get_command(line), get_status(line) > > but there is no guarantee that there isn't data in your file that breaks the > implied assumptions. Also, from the shell hackery it looks like your > ultimate goal seems to be a kind of frequency table which could be built > along these lines: > > freq = {} > with open(filename) as infile: > for line in infile: > service = get_service(line) > command = get_command(line) > status = get_status(line) > key = command, service, status > freq[key] = freq.get(key, 0) + 1 > > for key, occurences in sorted(freq.iteritems()): > print "Service: {}, Command: {}, Status: {}, Occurences: {}".format(*key > + (occurences,)) -- http://mail.python.org/mailman/listinfo/python-list