Hi... Sorry that this is a bit off track. Ok, maybe way off track!
But I don't have anyone to bounce this off of.. I'm working on a crawling project, crawling a college website, to extract course/class information. I've built a quick test app in python to crawl the site. I crawl at the top level, and work my way down to getting the required course/class schedule. The app works. I can consistently run it and extract the information. My issue is now that I have a "basic" app that works, i need to figure out how I guarantee that I'm correctly crawling the site. How do I know when I've got an error at a given node/branch, so that the app knows that it's not going to fetch the underlying branch/nodes of the tree.. How do I know when I have a complete "tree"! I'm looking for someone, or some group/prof that I can talk to about these issues. My goal is to eventually look at using nutch/lucene if at all applicable. Any pointers, or people, or papers, etc... would be helpful. Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org