Hi.

I’m pretty sure this discussion suggest that we (the LD community) should come 
try to come to some consensus of policy on exactly what it means if an agent 
finds a robots.txt on a Linked Data site.

So I have changed the subject line - sorry Chris, it should have been changed 
earlier.

Not an easy thing to come to, I suspect, but it seems to have become 
significant.
Is there a more official forum for this sort of thing?

On 26 Jul 2014, at 00:55, Luca Matteis <[email protected]> wrote:

> On Sat, Jul 26, 2014 at 1:34 AM, Hugh Glaser <[email protected]> wrote:
>> That sort of sums up what I want.
> 
> Indeed. So I agree that robots.txt should probably not establish
> whether something is a linked dataset or not. To me your data is still
> linked data even though robots.txt is blocking access of specific
> types of agents, such as crawlers.
> 
> Aidan,
> 
>> *) a Linked Dataset behind a robots.txt blacklist is not a Linked Dataset.
> 
> Isn't that a bit harsh? That would be the case if the only type of
> agent is a crawler. But as Hugh mentioned, linked datasets can be
> useful simply by treating URIs as dereferenceable identifiers without
> following links.
In Aidan’s view (I hope I am right here), it is perfectly sensible.
If you start from the premise that robots.txt is intended to prohibit access be 
anything other than a browser with a human at it, then only humans could fetch 
the RDF documents.
Which means that the RDF document is completely useless as a 
machine-interpretable semantics for the resource, since it would need a human 
to do some cut and paste or something to get it into a processor.

It isn’t really a question of harsh - it is perfectly logical from that view of 
robots.txt (which isn’t our view, because we think that robots.txt is about 
"specific types of agents”, as you say).

Cheers
Hugh

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652



Reply via email to