Re: Regex to extract multiple fields in the same line

MRAB Wed, 13 Jun 2018 12:04:02 -0700

On 2018-06-13 18:32, Ganesh Pal wrote:

On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James <[email protected]> wrote:

On 13/06/18 09:08, Ganesh Pal wrote:

  Hi Team,

I wanted to parse a file and extract few feilds that are present after "="
in a text file .


Example , form  the below line I need to extract the values present after
--struct =, --loc=, --size= and --log_file=

Sample input

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt size=8'


Did you mean "--size=8" at the end?  That's what your explanation implied.


Yes James you got it right ,  I  meant  "--size=8 " .,

Hi Team,

I played further with python's re.findall()  and  I am able to extract all
the required  fields , I have 2 further questions too , please suggest

Question 1:

  Please let me know  the mistakes in the below code and  suggest if it  can
be optimized further with better regex


# This code has to extract various the fields  from a single line (
assuming the line is matched here ) of a log file that contains various
values (and then store the extracted values in a dictionary )

import re

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'

#loc is an number
r_loc = r"--loc=([0-9]+)"
r_size = r'--size=([0-9]+)'
r_struct = r'--struct=([A-Za-z_]+)'
r_log_file = r'--log_file=([A-Za-z0-9_/.]+)'

Here you're searching for each match _twice_:

if re.findall(r_loc, line):
    print re.findall(r_loc, line)

if re.findall(r_size, line):
    print re.findall(r_size, line)

if re.findall(r_struct, line):
    print re.findall(r_struct, line)

if re.findall(r_log_file, line):
    print re.findall(r_log_file, line)


o/p:
root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
['0']
['8']
['data_block']
['/var/1000111/test18.log']


Question 2:

I  tried to see if I can use  re.search with look behind assertion , it
seems to work , any comments or suggestions

Example:

import re

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'

match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))', line)
if match:
    print match.group('loc')


o/p: root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py

0


I  want to build  the sub patterns and use match.group() to get the values
, some thing as show below but it doesn't seem to work


match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))'
                   r'(?P<size>(?<=--size=)([0-9]+))', line)
if match:
    print match.group('loc')
    print match.group('size')

You can combine them into a single findall:

captures = 
re.findall(r'--(loc=[0-9]+)|--(size=[0-9]+)|--(struct=[A-Za-z_]+)|--(log_file=[A-Za-z0-9_/.]+)',
 line)
captures

[('', '', 'struct=data_block', ''), ('', '', '','log_file=/var/1000111/test18.log'), ('loc=0', '', '', ''), ('','size=8', '', '')]

In each tuple of the list, there's only one match, the others are empty,so get rid of the empty ones:

[c for cap in captures for c in cap if c]


Split each of the matches on the first '=':

[c.split('=', 1) for cap in captures for c in cap if c]


For ease of use, pass the key/value pairs into dict:

info = dict(c.split('=', 1) for cap in captures for c in cap if c)
info

{'struct': 'data_block', 'log_file': '/var/1000111/test18.log', 'loc':'0', 'size': '8'}

--
https://mail.python.org/mailman/listinfo/python-list

Re: Regex to extract multiple fields in the same line

Reply via email to