On Aug 9, 11:22 pm, [EMAIL PROTECTED] wrote: > from each line separate out url and request parts. split the request into > key-value pairs, use urllib to unquote key-value pairs......as show below... > > import urllib > line = "GET > /stat.gif?stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton%20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton%20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7BNorton%20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BMicrosoft%20Windows%20XP+insecure%3BMicrosoft%20Windows%20XP%20Professional+f%3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 > HTTP/1.1" > words = line.split() > for word in words: > if word.find('?') >= 0: > req = word[word.find('?') + 1:] > kwds = req.split('&') > for kv in kwds: > print urllib.unquote(kv) > > stat=v > c=F-Secure > v=1.1 Build 14231 > s=av{Norton 360 (Symantec Corporation)+69;}sw{Norton 360 (Symantec > Corporation)+69;}fw{Norton 360 (Symantec Corporation)+5;}v{Microsoft Windows > XP+insecure;Microsoft Windows XP Professional+f;26027;26447;26003;22452;} > r=0.9496 > > good luck > Edwin > > -----Original Message----- > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] > On Behalf Of [EMAIL PROTECTED] > Sent: Saturday, August 09, 2008 10:48 AM > To: [EMAIL PROTECTED] > Subject: Extract string from log file > > 203.114.10.66 - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif? > stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton > %20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton > %20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7BNorton > %20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BMicrosoft%20Windows > %20XP+insecure%3BMicrosoft%20Windows%20XP%20Professional+f > %3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" 200 43 > "http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/ > card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; > SV1; .NET CLR 2.0.50727)" > > does anyone know how can i extract certain string from this log file > using regular expression in python or using XML. can teach me. > --http://mail.python.org/mailman/listinfo/python-list > > The information contained in this message and any attachment may be > proprietary, confidential, and privileged or subject to the work > product doctrine and thus protected from disclosure. If the reader > of this message is not the intended recipient, or an employee or > agent responsible for delivering this message to the intended > recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. > If you have received this communication in error, please notify me > immediately by replying to this message and deleting it and all > copies and backups thereof. Thank you. > >
do you mind to explain further. based on the source code that you gave me. what will it output. i wonder. Sorry i am new to string extraction. i do understand your python coding. the only thing i don't understand is this part. for word in words: if word.find('?') >= 0: req = word[word.find('?') + 1:] kwds = req.split('&') for kv in kwds: print urllib.unquote(kv) what does this code do? anyway, is this code automatic. what i mean is can it extract the string everytime when a new log file is being output by the sever? -- http://mail.python.org/mailman/listinfo/python-list