hi.. the url i'm focusing on is irrelevant to the issue i'm trying to solve at this time.
i think an approach will be to fire up a number of parsing attempts, and to track the returned depts/classes/etc... in theory (hopefully) i should be able to create a process to build a kind of statistical representation of what the site looks like (names of depts, names/number of classes for given depts, etc..) if i'm correct, this would provide a complete "list/understanding" of what the courselist looks like.... i could then run the parsing process a number of times, examining the actual value/results for the query, and taking the highest/oldest values for the given query.. the idea being that the app will return correct results for most of the queries, most of the time.. so from a statistical basis.. i can take the results that are returned with the highest frequency... so this approach might work. but again, haven't seen anything in the literature/'net that talks about this... thoughts... thanks -----Original Message----- From: python-list-bounces+bedouglas=earthlink....@python.org [mailto:python-list-bounces+bedouglas=earthlink....@python.org]on Behalf Of John Nagle Sent: Thursday, March 05, 2009 8:38 AM To: python-list@python.org Subject: Re: Parsing/Crawler Questions.. bruce wrote: > hi john.. > > You're missing the issue, so a little clarification... > > I've got a number of test parsers that point to a given classlist site.. the > scripts work. > > the issue that one faces is that you never "know" if you've gotten all of > the items/links that you're looking for based on the XPath functions. This > could be due to an error in the parsing, or it could be due to an admin > changing the site (removing/adding courses etc...) What URLs are you looking at? John Nagle -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list