On Jan 22, 7:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > ...I move from computer to > computer regularly, and while all have a recent copy of Python, each > has different (or no) extra modules, and I don't always have the > luxury of downloading extras. That being said, if there's a simple way > of doing it with BeautifulSoup, please show me an example. Maybe I can > figure out a way to carry the extra modules I need around with me.
Pyparsing's footprint is intentionally small - just one pyparsing.py file that you can drop into a directory next to your own script. And the code to extract paragraph 5 of the "Dive Into Python" home page? See annotated code below. -- Paul from pyparsing import makeHTMLTags, SkipTo, anyOpenTag, anyCloseTag import urllib import textwrap page = urllib.urlopen("http://diveintopython.org/") source = page.read() page.close() # define a simple paragraph matcher pStart,pEnd = makeHTMLTags("P") paragraph = pStart.suppress() + SkipTo(pEnd) + pEnd.suppress() # get all paragraphs from the input string (or use the # scanString generator function to stop at the correct # paragraph instead of reading them all) paragraphs = paragraph.searchString(source) # create a transformer that will strip HTML tags tagStripper = anyOpenTag.suppress() | anyCloseTag.suppress() # get paragraph[5] and strip the HTML tags p5TextOnly = tagStripper.transformString(paragraphs[5][0]) # remove extra whitespace p5TextOnly = " ".join(p5TextOnly.split()) # print out a nicely wrapped string - so few people know # that textwrap is part of the standard Python distribution, # but it is very handy print textwrap.fill(p5TextOnly, 60) -- http://mail.python.org/mailman/listinfo/python-list