Just what does robots.txt mean for a LOD site?

Hugh Glaser Sat, 26 Jul 2014 04:23:19 -0700

Hi.

I’m pretty sure this discussion suggest that we (the LD community) should come 
try to come to some consensus of policy on exactly what it means if an agent 
finds a robots.txt on a Linked Data site.


So I have changed the subject line - sorry Chris, it should have been changed 
earlier.

Not an easy thing to come to, I suspect, but it seems to have become 
significant.
Is there a more official forum for this sort of thing?

On 26 Jul 2014, at 00:55, Luca Matteis <[email protected]> wrote:

> On Sat, Jul 26, 2014 at 1:34 AM, Hugh Glaser <[email protected]> wrote:
>> That sort of sums up what I want.
> 
> Indeed. So I agree that robots.txt should probably not establish
> whether something is a linked dataset or not. To me your data is still
> linked data even though robots.txt is blocking access of specific
> types of agents, such as crawlers.
> 
> Aidan,
> 
>> *) a Linked Dataset behind a robots.txt blacklist is not a Linked Dataset.
> 
> Isn't that a bit harsh? That would be the case if the only type of
> agent is a crawler. But as Hugh mentioned, linked datasets can be
> useful simply by treating URIs as dereferenceable identifiers without
> following links.
In Aidan’s view (I hope I am right here), it is perfectly sensible.
If you start from the premise that robots.txt is intended to prohibit access be 
anything other than a browser with a human at it, then only humans could fetch 
the RDF documents.
Which means that the RDF document is completely useless as a 
machine-interpretable semantics for the resource, since it would need a human 
to do some cut and paste or something to get it into a processor.

It isn’t really a question of harsh - it is perfectly logical from that view of 
robots.txt (which isn’t our view, because we think that robots.txt is about 
"specific types of agents”, as you say).

Cheers
Hugh

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Just what *does* robots.txt mean for a LOD site?

Reply via email to

Just what does robots.txt mean for a LOD site?