Re: Ask for help on using re

jak Sat, 07 Aug 2021 11:02:05 -0700

Il 07/08/2021 11:18, jak ha scritto:

Il 07/08/2021 04:23, Jach Feng ha scritto:

jak 在 2021年8月6日 星期五下午4:10:05 [UTC+8] 的信中寫道：

Il 05/08/2021 11:40, Jach Feng ha scritto:

I want to distinguish between numbers with/without a dot attached:

text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
re.compile(r'ch \d{1,}[.]').findall(text)

['ch 1.', 'ch 23.']

re.compile(r'ch \d{1,}[^.]').findall(text)

['ch 23', 'ch 4 ', 'ch 56 ']

I can guess why the 'ch 23' appears in the second list. But how toget rid of it?


--Jach

import re
t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)

res = r.findall(t)

dot = [x[1] for x in res if x[1] != '']
udot = [x[0] for x in res if x[0] != '']

print(f"dot: {dot}")
print(f"undot: {udot}")

out:

dot: ['ch 4', 'ch 56']
undot: ['ch 1.', 'ch 23.']

The result can be influenced by the order of re patterns?

import re
t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M).findall(t)

[('ch 1.', ''), ('ch 23.', ''), ('', 'ch 4'), ('', 'ch 56')]

re.compile(r'(ch +\d+)|(ch +\d+\.)', re.M).findall(t)

[('ch 1', ''), ('ch 23', ''), ('ch 4', ''), ('ch 56', '')]

--Jach

Yes, when the patterns intersect each other as in your case. the
difference between the 2 patterns is the "." in addition. The logical or
does not continue checking when the condition is satisfied, so it is a
good idea, in these cases, to search for the most complete patterns
before the others.

PS
... the behavior of the logical or that I have described is not typical
of regular expressions but it is common in all programming languages.

--
https://mail.python.org/mailman/listinfo/python-list

Re: Ask for help on using re

Reply via email to