Re: Parsing/Crawler Questions..

MRAB Wed, 04 Mar 2009 14:19:58 -0800

bruce wrote:

Hi...


Sorry that this is a bit off track. Ok, maybe way off track!

But I don't have anyone to bounce this off of..

I'm working on a crawling project, crawling a college website, to extract
course/class information. I've built a quick test app in python to crawl the
site. I crawl at the top level, and work my way down to getting the required
course/class schedule. The app works. I can consistently run it and extract
the information. The required information is based upon an XPath analysis of
the DOM for the given pages that I'm parsing.

My issue is now that I have a "basic" app that works, I need to figure out
how I guarantee that I'm correctly crawling the site. How do I know when
I've got an error at a given node/branch, so that the app knows that it's
not going to fetch the underlying branch/nodes of the tree..

[snip]
If you were crawling the site yourself, how would _you_ know when you
had an error at a given node/branch?

--
http://mail.python.org/mailman/listinfo/python-list

Re: Parsing/Crawler Questions..

Reply via email to