On Dec 6, 9:21 am, Sumit <[EMAIL PROTECTED]> wrote: > Hi , > I am trying to splitt a Line whihc is below of format , > > AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd > cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk > Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/ > SelectProducts.aspx? > p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0] >
As John Machin mentioned, pyparsing may be helpful to you. Here is a simple version: data = """AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM- Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET / mci/performance/SelectProducts.aspx? p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]""" # Version 1 - simple from pyparsing import * LBRACK,RBRACK,COMMA = map(Suppress,"[],") num = Word(nums) date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) + \ oneOf("+ -") + num date.setParseAction(keepOriginalText) uuid = delimitedList(Word(hexnums),"-",combine=True) logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \ LBRACK + date + RBRACK + quotedString + quotedString + \ LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK print logString.parseString(data) Prints out: ['AzAccept', 'PLYSSTM01', '23/Sep/2005:16:14:28 -0500', '"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM- Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com"', '"plysmhc03zp GET / mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"', 'd4b62ca2-09a0-4334622b-0e1c-03c42ba5', '0'] And here is a slightly fancier version, which parses the quoted strings (uses the pprint pretty-printing module to show structure of the parsed results): # Version 2 - fancy from pyparsing import * LBRACK,RBRACK,COMMA = map(Suppress,"[],") num = Word(nums) date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) + \ oneOf("+ -") + num date.setParseAction(keepOriginalText) uuid = delimitedList(Word(hexnums),"-",combine=True) ipAddr = delimitedList(Word(nums),".",combine=True) keyExpr=Word(alphas.upper()) valExpr=CharsNotIn(',') qs1Expr = ipAddr + Group(delimitedList(Combine(keyExpr + '=' + valExpr))) def parseQS1(t): return qs1Expr.parseString(t[0]) def parseQS2(t): return t[0].split() qs1 = quotedString.copy().setParseAction(removeQuotes, parseQS1) qs2 = quotedString.copy().setParseAction(removeQuotes, parseQS2) logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \ LBRACK + date + RBRACK + qs1 + qs2 + \ LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK from pprint import pprint pprint(logString.parseString(data).asList()) Prints: ['AzAccept', 'PLYSSTM01', '23/Sep/2005:16:14:28 -0500', '162.44.245.32', ['CN=dddd cojack (890)', 'OU=1', 'OU=Customers', 'OU=ISM-Users', 'OU=kkk Secure', 'DC=customer', 'DC=rxcorp', 'DC=com'], 'plysmhc03zp', 'GET', '/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc', 'd4b62ca2-09a0-4334622b-0e1c-03c42ba5', '0'] Find more about pyparsing at http://pyparsing.wikispaces.com. -- Paul -- http://mail.python.org/mailman/listinfo/python-list