On Tuesday, March 6, 2012 4:52:10 PM UTC-6, Chris Rebert wrote: > On Tue, Mar 6, 2012 at 2:43 PM, John Salerno <johnj...@gmail.com> wrote: > > I sort of have to work with what the website gives me (as you'll see > > below), but today I encountered an exception to my RE. Let me just give all > > the specific information first. The point of my script is to go to the > > specified URL and extract song information from it. > > > > This is my RE: > > > > song_pattern = re.compile(r'([0-9]{1,2}:[0-9]{2} > > [a|p].m.).*?<a.*?>(.*?)</a>.*?<a.*?>(.*?)</a>', re.DOTALL) > > I would advise against using regular expressions to "parse" HTML: > http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags > > lxml is a popular choice for parsing HTML in Python: http://lxml.de > > Cheers, > Chris
Thanks, that was an interesting read :) Anything that allows me NOT to use REs is welcome news, so I look forward to learning about something new! :) -- http://mail.python.org/mailman/listinfo/python-list