Good morning all,
Wondering if you could please help me with the following query:-
I have just started learning Python last weekend after a colleague of mine 
showed me how to dramatically cut the time a Bash script takes to execute by 
re-writing it in Python.  I was amazed at how fast it ran.  I would now like to 
do the same thing with another script I have.

This other script reads a log file and using AWK it filters certain fields from 
the log and writes them to a new file.  See below the regex the script is 
executing.  I would like to re-write this regex in Python as my script is 
currently taking about 1 hour to execute on a log file with about 100,000 
lines.  I would like to cut this time down as much as possible.

cat logs/pdu_log_fe.log | awk -F\- '{print $1,$NF}' | awk -F\. '{print $1,$NF}' 
| awk '{print $1,$4,$5}' | sort | uniq | while read service command status; do 
echo "Service: $service, Command: $command, Status: $status, Occurrences: `grep 
$service logs/pdu_log_fe.log | grep $command | grep $status | wc -l | awk '{ 
print $1 }'`" >> logs/pdu_log_fe_clean.log; done

This AWK command gets lines which look like this:-

2011-05-16 09:46:22,361 [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_ 
CC_SMS_SERVICE_51408_656-ServerThread-VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
 - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004 Status: 0 
SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >

And outputs lines like this:-

CC_SMS_SERVICE_51408 submit_resp: 0

I have tried writing the Python script myself but I am getting stuck writing 
the regex.  So far I have the following:-

#!/usr/bin/python

# Import RegEx module
import re as regex
# Log file to work on
filetoread = open('/tmp/ pdu_log.log', "r")
# File to write output to
filetowrite =  file('/tmp/ pdu_log_clean.log', "w")
# Perform filtering in the log file
linetoread = filetoread.readlines()
for line in linetoread:
    filter0 = regex.sub(r"<G_","",line)
    filter1 = regex.sub(r"\."," ",filter0)
# Write new log file
    filetowrite.write(filter1)
filetowrite.close()
# Read new log and get required fields from it
filtered_log =  open('/tmp/ pdu_log_clean.log', "r")
filtered_line = filtered_log.readlines()
for line in filtered_line:
    token = line.split(" ")
    print token[0], token[1], token[5], token[13], token[20]
print "Done"

Ugly I know but please bear in mind that I have just started learning Python 
two days ago.

I have been looking on this group and on the Internet for snippets of code that 
I could use but so far what I have found do not fit my needs or are too 
complicated (at least for me).

Any suggestion, advice you can give me on how to accomplish this task will be 
greatly appreciated.

On another note, can you also recommend a good no-nonsense book to learn 
Python?  I have read the book “A Byte of Python” by Swaroop C H (great 
introductory book!) and I am now reading “Dive into Python” by Mark Pilgrim.  I 
am looking for a book that explains things in simple terms and goes straight to 
the point (similar to how “A Byte of Python” was written)

Thanks in advance

Kind regards,

Junior
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to