"Bell, Kevin" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > I would like some feedback about my actual intention though, which is to > scrape local newspaper websites for the names of people that I work > with. Twice this month, colleagues have unknowingly been in the > newspaper, and only became aware of it because someone stumbled across > the line in the article. To write a script that would crawl around > testing for my own name, or that of my colleagues, wouldn't seem uncouth > to me, but I'm new at this stuff. It seems impolite for newspapers to > use someone's name without informing them of it, for sure, but you can't > count on journalists to call you up. Would this application of a spider > be impolite?
If the site has an index, I would use that. If the site has pages at fixed urls accessible to public indexes (Google, Yahoo, etc) I would use one of those. (Google, at least, will search a specific site.) If the site has a robots.txt file requesting robots and spiders to restrict themselvres, I would honor the request. Failing the above, I might write something to once a day, during off hours, download and examine articles in the appropriate category. tjr -- http://mail.python.org/mailman/listinfo/python-list