Fredrik wrote: to grab entire sites ? try doing that on a commercial data provider's site, and chances are that you'll end up being banned (or sued) within hours ... -------------
Me: Nope, I never said that to start with... Well I certainly am learning a lot. I never said I intended to download anyone's entire website, as was assumed, but it's been fun to see how folks feel about it anyway! I would like some feedback about my actual intention though, which is to scrape local newspaper websites for the names of people that I work with. Twice this month, colleagues have unknowingly been in the newspaper, and only became aware of it because someone stumbled across the line in the article. To write a script that would crawl around testing for my own name, or that of my colleagues, wouldn't seem uncouth to me, but I'm new at this stuff. It seems impolite for newspapers to use someone's name without informing them of it, for sure, but you can't count on journalists to call you up. Would this application of a spider be impolite? Bell, Kevin wrote: >>use a search engine (try the search box in the upper right corner). > > >>using a spider to download the entire site just so you can "search > > through >it" is bloody impolite. > > Really? I'd argue that's impolite only if you're an impolite person > with a rude agenda, which is not what I had in mind, but thanks for > the ethics lecture as well as the pointer ; ) I assure you that I > harbor no nefarious scheme. Isn't it common for folks to watch the > stock market, or real estate listings, for example? > > I'll look into to tools you mentioned, and thanks again! > > I think Fredrik's right: the intarweb is supposed to be distributed, not live on your desktop. Folk who watch the stock market don't download twenty years' worth of data in one afternoon, they generally subscribe to real-time feeds that are relatively low volume. regards Steve -- http://mail.python.org/mailman/listinfo/python-list