from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below...
import urllib line = "GET /stat.gif?stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton%20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton%20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7BNorton%20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BMicrosoft%20Windows%20XP+insecure%3BMicrosoft%20Windows%20XP%20Professional+f%3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" words = line.split() for word in words: if word.find('?') >= 0: req = word[word.find('?') + 1:] kwds = req.split('&') for kv in kwds: print urllib.unquote(kv) stat=v c=F-Secure v=1.1 Build 14231 s=av{Norton 360 (Symantec Corporation)+69;}sw{Norton 360 (Symantec Corporation)+69;}fw{Norton 360 (Symantec Corporation)+5;}v{Microsoft Windows XP+insecure;Microsoft Windows XP Professional+f;26027;26447;26003;22452;} r=0.9496 good luck Edwin -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Saturday, August 09, 2008 10:48 AM To: python-list@python.org Subject: Extract string from log file 203.114.10.66 - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif? stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton %20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton %20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7BNorton %20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BMicrosoft%20Windows %20XP+insecure%3BMicrosoft%20Windows%20XP%20Professional+f %3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" 200 43 "http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/ card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" does anyone know how can i extract certain string from this log file using regular expression in python or using XML. can teach me. -- http://mail.python.org/mailman/listinfo/python-list The information contained in this message and any attachment may be proprietary, confidential, and privileged or subject to the work product doctrine and thus protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify me immediately by replying to this message and deleting it and all copies and backups thereof. Thank you. -- http://mail.python.org/mailman/listinfo/python-list