Re: RE Question

Gabriel Genellina Mon, 03 Aug 2009 10:12:04 -0700

En Sun, 02 Aug 2009 18:22:20 -0300, Victor Subervi<[email protected]> escribió:

How do I search and replace something like this:
aLine = re.sub('[<]?[p]?[>]?<font size="h' + str(x) + '"[
a-zA-Z0-9"\'=:]*>[<]?[b]?[>]?', '<h' + str(x) + '>', aLine)
where RE *only* looks for the possibility of "<p>" at the beginning ofthestring; that is, not the individual components as I have it coded above,but
the entire 3-character block?

An example would make it more clear; I think you want to match either"<p><font size=...." or "<font size=....". In other words, "<p>" isoptional. Use a normal group or a non-capturing group:

r'(<p>)?<font size="...'
r'(?:<p>)?<font size="...'

That said, using regular expressions to parse HTML or XML is terriblyfragile; I'd use a specific tool (like BeautifulSoup, ElementTree, or lxml)


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: RE Question

Reply via email to