On Sun, 29 May 2011 08:41:16 -0500, Andrew Berg wrote: > On 2011.05.29 08:09 AM, Steven D'Aprano wrote: [...] > Kodos is written in Python and uses Python's regex engine. In fact, it > is specifically intended to debug Python regexes.
Fair enough. >> Secondly, you probably should use a proper HTML parser, rather than a >> regex. Resist the temptation to use regexes to rip out bits of text >> from HTML, it almost always goes wrong eventually. > > I find this a much simpler approach, especially since I'm dealing with > broken HTML. I guess I don't see how the effort put into learning a > parser and adding the extra code to use it pays off in this particular > endeavor. The temptation to take short-cuts leads to the Dark Side :) Perhaps you're right, in this instance. But if you need to deal with broken HTML, try BeautifulSoup. >> What makes you think it shouldn't match? > > AFAIK, dots aren't supposed to match carriage returns or any other > whitespace characters. They won't match *newlines* \n unless you pass the DOTALL flag, but they do match whitespace: >>> re.search('abc.efg', '----abc efg----').group() 'abc efg' >>> re.search('abc.efg', '----abc\refg----').group() 'abc\refg' >>> re.search('abc.efg', '----abc\nefg----') is None True -- Steven -- http://mail.python.org/mailman/listinfo/python-list