Re: getfirst and re

Tim Chase Wed, 06 Jan 2010 10:02:35 -0800

Victor Subervi wrote:

On Wed, Jan 6, 2010 at 1:27 PM, Tim Chase <python.l...@tim.thechases.com>wrote:

But if you're using it on HTML form text, regexps are usually the wrong
tool, and you should be using an HTML parser (such as BeautifulSoup) that
knows how to handle odd text and escapings better and more robustly than
regexps will


I have an automatically generated HTML form from which I need to extract
data to the script which this form calls (to which the information is sent).
I believe BeautifulSoup is geared to scraping pages that exist permanently
on the web. By the time BeautifulSoup was called, this page would be gone.

BeautifulSoup takes string data fed to it, and builds a structurethat can be neatly navigated. That string data can come from aweb page, from a disk, or even a serial port, arandom-character-generator, or just from HTML that's built up inmemory and never sees a network or a disk. It's worth readingits documentation[1] and trying its examples to get familiar with it.


-tkc


[1]
http://www.crummy.com/software/BeautifulSoup/documentation.html



--
http://mail.python.org/mailman/listinfo/python-list

Re: getfirst and re

Reply via email to