Re: What's the best way to write this regular expression?

John Salerno Tue, 06 Mar 2012 15:06:10 -0800

On Tuesday, March 6, 2012 4:52:10 PM UTC-6, Chris Rebert wrote:
> On Tue, Mar 6, 2012 at 2:43 PM, John Salerno <[email protected]> wrote:
> > I sort of have to work with what the website gives me (as you'll see 
> > below), but today I encountered an exception to my RE. Let me just give all 
> > the specific information first. The point of my script is to go to the 
> > specified URL and extract song information from it.
> >
> > This is my RE:
> >
> > song_pattern = re.compile(r'([0-9]{1,2}:[0-9]{2} 
> > [a|p].m.).*?<a.*?>(.*?)</a>.*?<a.*?>(.*?)</a>', re.DOTALL)
> 
> I would advise against using regular expressions to "parse" HTML:
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
> 
> lxml is a popular choice for parsing HTML in Python: http://lxml.de
> 
> Cheers,
> Chris


Thanks, that was an interesting read :)

Anything that allows me NOT to use REs is welcome news, so I look forward to 
learning about something new! :)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: What's the best way to write this regular expression?

Reply via email to