On Dec 7, 2:21 am, Sumit <[EMAIL PROTECTED]> wrote: > Hi , > I am trying to splitt a Line whihc is below of format , > > AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd > cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk > Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/ > SelectProducts.aspx? > p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]
Because lines are mangled in transmission, it is rather difficult to guess exactly what you have in your input and what your expected results are. Also you don't show exactly what you have tried. At the end is a small script that contains my guess as to your input and expected results, shows an example of what the re.VERBOSE flag is intended for, and how you might debug your results. So that you don't get your homework done 100% for free, I haven't corrected the last mistake I made. As usual, re may not be the best way of doing this exercise. Your *single* piece of evidence may not be enough. It appears to be a horrid conglomeration of instances of different things, each with its own grammar. You may find that something like PyParsing would be more legible and more robust. > > Here all the string whihc i want to split is > --------------------------------- > AzAccept > PLYSSTM01 > [23/Sep/2005:16:14:28 -0500] > 162.44.245.32 > CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk > Secure,DC=customer,DC=rxcorp,DC=com" > GET > /mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc > d4b62ca2-09a0-4334622b-0e1c-03c42ba5 > 0 > -------------------------------- > > i am trying to use re.split() method to split them , But unable to get > the exact result . > C:\junk>type sumit.py import re textin = \ """AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd """ \ """cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk """ \ """Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/ performance/""" \ """SelectProducts.aspx?""" \ """p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]""" expected = [ "AzAccept", "PLYSSTM01", "23/Sep/2005:16:14:28 -0500", "162.44.245.32", "CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk Secure,DC=custom er,DC=rxcorp,DC=com", "plysmhc03zp", "GET", "/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc", "d4b62ca2-09a0-4334622b-0e1c-03c42ba5", "0", ] pattern = r""" (\S+) # AzAccept \s+ (\S+) # PLYSSTM01 \s+\[ ([^]]+) # 23/Sep/2005:16:14:28 -0500 ]\s+" (\S+) # 162.44.245.32 \s+ ([^"]+) # CN=dddd cojack (890),OU=1, etc etc,DC=rxcorp,DC=com "\s+" (\S+) # plysmhc03zp \s+ (\S+) # GET \s+ (\S+) # /mci/performance/ ... menu=adhoc \s+\[ ([^]]+) # d4b62ca2-09a0-4334622b-0e1c-03c42ba5 ]\s+\[ ([^]]+) # 0 ]$ """ mobj = re.match(pattern, textin, re.VERBOSE) if not mobj: print "Bzzzt!" else: result = mobj.groups() print "len check", len(result) == len(expected), len(result), len(expected) for a, b in zip(result, expected): print a == b, repr(a), repr(b) C:\junk>python sumit.py len check True 10 10 True 'AzAccept' 'AzAccept' True 'PLYSSTM01' 'PLYSSTM01' True '23/Sep/2005:16:14:28 -0500' '23/Sep/2005:16:14:28 -0500' True '162.44.245.32' '162.44.245.32' True 'CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk Secure,DC=custo mer,DC=rxcorp,DC=com' 'CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM- Users,OU=kk k Secure,DC=customer,DC=rxcorp,DC=com' True 'plysmhc03zp' 'plysmhc03zp' True 'GET' 'GET' False '/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"' '/mci/perf ormance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc' True 'd4b62ca2-09a0-4334622b-0e1c-03c42ba5' 'd4b62ca2-09a0-4334622b-0e1c-03c42ba 5' True '0' '0' C:\junk> -- http://mail.python.org/mailman/listinfo/python-list