merlyn@stonehenge.com (Randal L. Schwartz) writes:

>>>>>> "Robin" == Robin Norwood <[EMAIL PROTECTED]> writes:
>
> Robin> But Google does use the data in indexes for personal gain...it derives
> Robin> significant revenue from the advertising done on it's site.
>
> s/it's/its/, but I don't want to get distracted here. :)

Argh!  :-)

> The *primary* purpose of Google's visit is to let *me* find the site.  That
> google has a business model that pays for the collection of that information
> doesn't make a difference to the argument.

Well, we disagree on the primary purpose of Google's visit.  I think
Google's visit is primarily aimed at making money for its (ha!)
stockholders, since it is a publicly traded company.  It makes this
money by providing content to you, along with advertising.  If it
weren't for the advertising revenue, I doubt Google would be willing to
provide you the content for free.

> Yes, I'm *betting* that the *primary* purpose of the OP's scraping is not a
> "public good", but rather a private increase in knowledge, power, or possibly
> commerce, but I'd bet that's a safe bet. :)

And my belief is that while the OP may do something unethical with the
data, the mere act of acquiring the data isn't unethical.  He should, of
course be aware of the ethical and legal ramifications of such use.  But
he should consult a lawyer for that, not us. :-)


I did forget one important point - the spiders.txt file.  As I'm sure
you know, the standard way of signaling to a spider if it is allowed to
index a site is with a 'robots.txt' file in the root of the HTML
directory.

wget http://www.nukeforums.com/robots.txt

Shows such a file, and at least according to my reading of it, most
spiders are allowed to index the site, except for specific directories.
Some spiders are specifically excluded, and some spiders are asked to
include a delay between requests.


Other than that, I think we've made our cases, and will have to agree to
disagree.

Thanks for the debate,

-RN

-- 
Robin Norwood
Red Hat, Inc.

"The Sage does nothing, yet nothing remains undone."
-Lao Tzu, Te Tao Ching

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to