Re: search an entire website given the homepage URL

Terry Reedy Tue, 25 Apr 2006 16:56:10 -0700

"Bell, Kevin" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> I would like some feedback about my actual intention though, which is to
> scrape local newspaper websites for the names of people that I work
> with.  Twice this month, colleagues have unknowingly been in the
> newspaper, and only became aware of it because someone stumbled across
> the line in the article.  To write a script that would crawl around
> testing for my own name, or that of my colleagues, wouldn't seem uncouth
> to me, but I'm new at this stuff.  It seems impolite for newspapers to
> use someone's name without informing them of it, for sure, but you can't
> count on journalists to call you up.  Would this application of a spider
> be impolite?


If the site has an index, I would use that.
If the site has pages at fixed urls accessible to public indexes (Google, 
Yahoo, etc) I would use one of those.  (Google, at least, will search a 
specific site.)
If the site has a robots.txt file requesting robots and spiders to restrict 
themselvres, I would honor the request.
Failing the above, I might write something to once a day, during off hours, 
download and examine articles in the appropriate category.

tjr



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: search an entire website given the homepage URL

Reply via email to