I'm sure I'll soon figure out how to find these things out for myself,
but I'd like to get the community's advice on something.

I'm going to throw together a quick project over the weekend: a
spider. I want to scan a website for certain elements.

I come from a PHP background, so normally I'd:
 - throw together a quick REST script to handle http request/responses
 - load the pages into a simplexml object and
 - run an xpath over the dom to find the nodes I need to test

One of the benefits of PHP's dom implementation is that you can easily
load both XML and HTML4 documents - the HTML gets normalised to XML
during the import.

So, my questions are:

Is there a python module to easily handle http request/responses?

Is there a python dom module that works similar to php's when working
with older html?

What python module would I use to apply an XPath expression over a dom
and return the results?

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to