Re: What's the best way to write this regular expression?

Chris Rebert Tue, 06 Mar 2012 14:55:37 -0800

On Tue, Mar 6, 2012 at 2:43 PM, John Salerno <johnj...@gmail.com> wrote:
> I sort of have to work with what the website gives me (as you'll see below), 
> but today I encountered an exception to my RE. Let me just give all the 
> specific information first. The point of my script is to go to the specified 
> URL and extract song information from it.
>
> This is my RE:
>
> song_pattern = re.compile(r'([0-9]{1,2}:[0-9]{2} 
> [a|p].m.).*?<a.*?>(.*?)</a>.*?<a.*?>(.*?)</a>', re.DOTALL)


I would advise against using regular expressions to "parse" HTML:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

lxml is a popular choice for parsing HTML in Python: http://lxml.de

Cheers,
Chris
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: What's the best way to write this regular expression?

Reply via email to