Re: How can I exclude a word by using re?

Paul McGuire Mon, 15 Aug 2005 21:21:47 -0700

Given the example re that you've been trying to get working, here is a
pyparsing approach that might be more, um, approachable.
Unfortunately, since I don't have the URL of the page you are working
with, I'm unable to test this before posting.


Good luck,
-- Paul

# getMP3s.py
# get pyparsing at http://pyparsing.sourceforge.net
#

from pyparsing import *
import urllib

#~
r=re.compile(ur'valign=top>(?P<number>\d{1,2})</td><td[^>]*>\s{0,2}'

#~ ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>'
#~ ur'(?P<name>.+)</td>',re.UNICODE|re.IGNORECASE)

tdStart,tdEnd = makeHTMLTags("td")
aStart,aEnd = makeHTMLTags("a")

number = Word(nums)
valign = CaselessLiteral("valign=top>")

mp3Entry = valign + number.setResultsName("number") + tdEnd + \
            tdStart + SkipTo(aStart) + aStart + \
            SkipTo(tdEnd) + tdEnd

# get list of mp3's
targetURL = "http://whatever";
targetPage = urllib.urlopen( targetURL )
targetHTML = targetPage.read()
targetPage.close()

for toks,s,e in mp3Entry.scanString(targetHTML):
    print toks.number, toks.starta.href

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How can I exclude a word by using re?

Reply via email to