Sam Giraffe <s...@giraffetech.biz> writes: > Hi, > > I am trying to split up the re pattern for Apache log file format and seem to > be having some > trouble in getting Python to understand multi-line pattern: > > #!/usr/bin/python > > import re > > #this is a single line > string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" 302 > 276 "-" "check_http/ > v1.4.16 (nagios-plugins 1.4.16)"' > > #trying to break up the pattern match for easy to read code > pattern = re.compile(r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+' > r'(?P<ident>\-)\s+' > r'(?P<username>\-)\s+' > r'(?P<TZ>\[(.*?)\])\s+' > r'(?P<url>\"(.*?)\")\s+' > r'(?P<httpcode>\d{3})\s+' > r'(?P<size>\d+)\s+' > r'(?P<referrer>\"\")\s+' > r'(?P<agent>\((.*?)\))') > > match = re.search(pattern, string) > > if match: > print match.group('ip') > else: > print 'not found' > > The python interpreter is skipping to the 'math = re.search' and then the > 'if' statement right > after it looks at the <ip>, instead of moving onto <ident> and so on.
Although you have written the regexp as a sequence of lines, in reality it is a single string, and therefore pdb will do only a single step, and not go into its "parts", which really are not parts. > > mybox:~ user$ python -m pdb /Users/user/Documents/Python/apache.py >> /Users/user/Documents/Python/apache.py(3)<module>() > -> import re > (Pdb) n >> /Users/user/Documents/Python/apache.py(5)<module>() > -> string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" > 302 276 "-" > "check_http/v1.4.16 (nagios-plugins 1.4.16)"' > (Pdb) n >> /Users/user/Documents/Python/apache.py(7)<module>() > -> pattern = re.compile(r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+' > (Pdb) n >> /Users/user/Documents/Python/apache.py(17)<module>() > -> match = re.search(pattern, string) > (Pdb) Also as Andreas has noted the r'(?P<referrer>\"\")\s+' part is wrong. It should probably be r'(?P<referrer>\".*?\")\s+' And the r'(?P<agent>\((.*?)\))') will also not match as there is text outside the (). Should probably also be r'(?P<agent>\".*?\")') or something like it. -- Piet van Oostrum <p...@vanoostrum.org> WWW: http://pietvanoostrum.com/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list