On 06/15/2018 12:37 PM, Ganesh Pal wrote:
Hey Friedrich,
The proposed solution worked nice , Thank you for the reply really
appreciate that
Only thing I think would need a review is if the assignment of the value
of one dictionary to the another dictionary if is done correctly ( lines
17 to 25 in the below code)
Here is my code :
root@X1:/Play_ground/SPECIAL_TYPES/REGEX# vim Friedrich.py
1 import re
2 from collections import OrderedDict
3
4 keys = ["struct", "loc", "size", "mirror",
5 "filename","final_results"]
6
7 stats = OrderedDict.fromkeys(keys)
8
9
10 line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --s ize=8'
11
12 regex = re.compile (r"--(struct|loc|size|mirror|
log_file)\s*=\s*([^\s]+)")
13 result = dict(re.findall(regex, line))
14 print result
15
16 if result['log_file']:
17 stats['filename'] = result['log_file']
18 if result['struct']:
19 stats['struct'] = result['struct']
20 if result['size']:
21 stats['size'] = result['size']
22 if result['loc']:
23 stats['loc'] = result['loc']
24 if result['mirror']:
25 stats['mirror'] = result['mirror']
26
27 print stats
28
Looks okay to me. If you'd read 'result' using 'get' you wouldn't need
to test for the key. 'stats' would then have all keys and value None for
keys missing in 'result':
stats['filename'] = result.get ('log_file')
stats['struct'] = result.get ('struct')
This may or may not suit your purpose.
Also, I think the regex can just be
(r"--(struct|loc|size|mirror|log_file)=([^\s]+)")
no need to match white space character (\s* ) before and after the =
symbol because this would never happen ( this line is actually a key=value
pair of a dictionary getting logged)
You are right. I thought your sample line had a space in one of the
groups and didn't reread to verify, letting the false impression take
hold. Sorry about that.
Frederic
Regards,
Ganesh
On Fri, Jun 15, 2018 at 12:53 PM, Friedrich Rentsch <
anthra.nor...@bluewin.ch> wrote:
Hi Ganesch. Having proposed a solution to your problem, it would be kind
of you to let me know whether it has helped. In case you missed my
response, I repeat it:
regex = re.compile (r"--(struct|loc|size|mirror|l
og_file)\s*=\s*([^\s]+)")
regex.findall (line)
[('struct', 'data_block'), ('log_file', '/var/1000111/test18.log'),
('loc', '0'), ('mirror', '10')]
Frederic
On 06/13/2018 07:32 PM, Ganesh Pal wrote:
On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James <rho...@kynesim.co.uk>
wrote:
On 13/06/18 09:08, Ganesh Pal wrote:
Hi Team,
I wanted to parse a file and extract few feilds that are present after
"="
in a text file .
Example , form the below line I need to extract the values present
after
--struct =, --loc=, --size= and --log_file=
Sample input
line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt size=8'
Did you mean "--size=8" at the end? That's what your explanation
implied.
Yes James you got it right , I meant "--size=8 " .,
Hi Team,
I played further with python's re.findall() and I am able to extract all
the required fields , I have 2 further questions too , please suggest
Question 1:
Please let me know the mistakes in the below code and suggest if it
can
be optimized further with better regex
# This code has to extract various the fields from a single line (
assuming the line is matched here ) of a log file that contains various
values (and then store the extracted values in a dictionary )
import re
line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'
#loc is an number
r_loc = r"--loc=([0-9]+)"
r_size = r'--size=([0-9]+)'
r_struct = r'--struct=([A-Za-z_]+)'
r_log_file = r'--log_file=([A-Za-z0-9_/.]+)'
if re.findall(r_loc, line):
print re.findall(r_loc, line)
if re.findall(r_size, line):
print re.findall(r_size, line)
if re.findall(r_struct, line):
print re.findall(r_struct, line)
if re.findall(r_log_file, line):
print re.findall(r_log_file, line)
o/p:
root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
['0']
['8']
['data_block']
['/var/1000111/test18.log']
Question 2:
I tried to see if I can use re.search with look behind assertion , it
seems to work , any comments or suggestions
Example:
import re
line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'
match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))', line)
if match:
print match.group('loc')
o/p: root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
0
I want to build the sub patterns and use match.group() to get the values
, some thing as show below but it doesn't seem to work
match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))'
r'(?P<size>(?<=--size=)([0-9]+))', line)
if match:
print match.group('loc')
print match.group('size')
Regards,
Ganesh
--
https://mail.python.org/mailman/listinfo/python-list