I'm using Beautiful Soup to extract some song information from a radio station's website that lists the songs it plays as it plays them. Getting the time that the song is played is easy, because the time is wrapped in a <div> tag all by itself with a class attribute that has a specific value I can search for. But the actual song title and artist information is harder, because the HTML isn't quite as precise. Here's a sample:
<div class="cmPlaylistContent"> <strong> <a href="/lsp/t2995/"> Love Without End, Amen </a> </strong> <br/> <a href="/lsp/a436/"> George Strait </a> <br/> <span class="sprite iconDownload"> </span> Download Song: <a href="http://itunes.apple.com/us/album/love-without-end-amen/ id71416?i=71404&uo=4"> iTunes </a> | <a href="http://www.amazon.com/Love-Without-End-Amen/dp/B000V638BQ? SubscriptionId=1NXYFBZST44V8CCDK182&tag=coxradiointer-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B000V638BQ"> Amazon MP3 </a> <br/> <span class="sprite iconComments"> Comments (1) </span> <span class="sprite iconVoteUp"> Votes (1) </span> </div> This is about as far as I can drill down without getting TOO specific. I simply find the <div> tags with the "cmPlaylistContent" class. This tag contains both the song title and the artist name, and sometimes miscellaneous other information as well, like a way to vote for the song or links to purchase it from iTunes or Amazon. So my question is, given the above HTML, how can I best extract the song title and artist name? It SEEMS like they are always the first two pieces of information in the tag, such that: for item in div.stripped_strings: print(item) Love Without End, Amen George Strait Download Song: iTunes | Amazon MP3 Comments (1) Votes (1) and I could simply get the first two items returned by that generator. It's not quite as clean as I'd like, because I have no idea if anything could ever be inserted before either of these items, thus messing it all up. I also don't want to rely on the <strong> tag, which makes me shudder, or the <a> tag, because I don't know if they will always have an href. Ideall, the <a> tag would have also had an attribute that labeled the title as the title, and the artist as the artist, but alas..... Therefore, I appeal to your greater wisdom in these matters. Given this HTML, is there a "best practice" for how to refer to the song title and artist? Thanks! -- http://mail.python.org/mailman/listinfo/python-list