merlyn@stonehenge.com (Randal L. Schwartz) writes: >>>>>> "Robin" == Robin Norwood <[EMAIL PROTECTED]> writes: > > Robin> But Google does use the data in indexes for personal gain...it derives > Robin> significant revenue from the advertising done on it's site. > > s/it's/its/, but I don't want to get distracted here. :)
Argh! :-) > The *primary* purpose of Google's visit is to let *me* find the site. That > google has a business model that pays for the collection of that information > doesn't make a difference to the argument. Well, we disagree on the primary purpose of Google's visit. I think Google's visit is primarily aimed at making money for its (ha!) stockholders, since it is a publicly traded company. It makes this money by providing content to you, along with advertising. If it weren't for the advertising revenue, I doubt Google would be willing to provide you the content for free. > Yes, I'm *betting* that the *primary* purpose of the OP's scraping is not a > "public good", but rather a private increase in knowledge, power, or possibly > commerce, but I'd bet that's a safe bet. :) And my belief is that while the OP may do something unethical with the data, the mere act of acquiring the data isn't unethical. He should, of course be aware of the ethical and legal ramifications of such use. But he should consult a lawyer for that, not us. :-) I did forget one important point - the spiders.txt file. As I'm sure you know, the standard way of signaling to a spider if it is allowed to index a site is with a 'robots.txt' file in the root of the HTML directory. wget http://www.nukeforums.com/robots.txt Shows such a file, and at least according to my reading of it, most spiders are allowed to index the site, except for specific directories. Some spiders are specifically excluded, and some spiders are asked to include a delay between requests. Other than that, I think we've made our cases, and will have to agree to disagree. Thanks for the debate, -RN -- Robin Norwood Red Hat, Inc. "The Sage does nothing, yet nothing remains undone." -Lao Tzu, Te Tao Ching -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>