On 6 Mar 2014, at 21:10, Michel Dumontier 
<[email protected]<mailto:[email protected]>> wrote:



On Thu, Mar 6, 2014 at 3:59 AM, Gray, Alasdair J G 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

In trying to validate the new ChEMBL 17 dataset description I came across the 
following issues.

  1.  The validator I stood up does not accept standard turtle. This is going 
to require some effort to fix.

which toolkit are you using to parse it in the first place? maybe you could 
call a web service like http://www.w3.org/RDF/Validator/

I’m using Eric Prud’hommeaux’s ShapeExpression scripts
http://www.w3.org/2013/ShEx/Primer


  1.  Should Summary Level Descriptions have a created date indicating when the 
dataset was originally created?

i believe that the consensus was that it would not, and that create date would 
be associated with version level descriptions (e.g. version 1 = first create 
date).

This of course assumes that folk are going to create descriptions for all 
historic versions. For instance, i would be surprised if ChEMBL create 
descriptions for all 17 (soon to be 18) versions, but rather only provide those 
for the versions that have RDF, i.e. version 14 onwards, and then the earlier 
ones probably won’t be updated to the new standard.


  1.  We do not have any statement of requirements for dct:theme and 
dct:keyword for version and distribution level descriptions.

chembl actually has a void file where they identify their vocabs. i haven't 
checked whether they associate keywords.

Vocabularies are down at the distribution level which I did not validate. I 
suspect that they have also done theme’s at the subset level so that they can 
say that the molecules subset has a theme of chemical molecules.

The bigger question here is how should subsets interact with version and 
distribution level descriptions?

  1.  We say that a summary level distribution may have a dcat:accessURL. I’m 
not sure this is correct.

 i think they should too.

Slightly confused, you think the summary level should have a dcat:accessURL or 
that it shouldn’t.

  1.  Should distribution level descriptions also be typed as 
dcat:Distributions?

yes. It would also be good to think about tagging our version level description 
as :VersionedDatasetDescription - can we find a vocabulary home for this?

Tough question. Not sure that any of VoID or DCAT would adopt this as it 
doesn’t fit with their data models. However, we should ask.

  1.  We’ve not got any details of the sparql endpoint in the table of 
properties.

right.  the problem is that the representation is complex (not just a 
predicate-object pair) and really doesn't fit well in that table - unless we 
enable the specification of void:sparqlendpoint, which we ruled out because its 
time-dependent nature (that version of the data may not be in the sparql 
endpoint in the future).

Regardless of the fact that the representation of the object is complex, we 
should have a row in the table that ensures that folk don’t miss it. When 
generating description, most developers will use the table as a checklist. 
Likewise when creating the validator.

I have attached the an updated version of the ChEMBL description. My question 
is, how should the distribution level description be validated since it is 
split into several subsets?

i think that <http://rdf.ebi.ac.uk/distribution/rdf/chembl/17.0> is a subset of 
the versioned dataset
<http://rdf.ebi.ac.uk/dataset/chembl/17.0>, and this then points to the various 
file-based distributions.

Are ChEMBL Molecules, ChEMBL Targets, etc subsets of a distribution or a 
version? We could of course argue that they are subsets of the dataset, but 
then we would have to repeat the versioning information across each subset. I 
think it makes most sense to split at the version level first, but then I’m 
wondering if we should do the subsets ahead of the distributions. However, this 
approach might not make sense for other datasets, does anyone have a 
counterexample?

Alasdair

m.

Cheers,

Alasdair


Alasdair J G Gray
Lecturer in Computer Science, Heriot-Watt University, UK.
Email: [email protected]<mailto:[email protected]>
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/0000-0002-5711-4872
Telephone: +44 131 451 3429<tel:%2B44%20131%20451%203429>
Twitter: @gray_alasdair
Arrange a Meeting: http://doodle.com/agray

--

PLEASE NOTE: There may be a delay in me dealing with your email as I am 
participating in UCU industrial action by ‘working to contract’ in support of 
the union’s campaign for fair pay in higher education.
For more info go here www.ucu.org.uk/hepay13<http://www.ucu.org.uk/hepay13>





________________________________

Sunday Times Scottish University of the Year 2011-2013
Top in the UK for student experience
Fourth university in the UK and top in Scotland (National Student Survey 2012)

We invite research leaders and ambitious early career researchers to join us in 
leading and driving research in key inter-disciplinary themes. Please see 
www.hw.ac.uk/researchleaders<http://www.hw.ac.uk/researchleaders> for further 
information and how to apply.

Heriot-Watt University is a Scottish charity registered under charity number 
SC000278.


Alasdair J G Gray
Lecturer in Computer Science, Heriot-Watt University, UK.
Email: [email protected]<mailto:[email protected]>
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/0000-0002-5711-4872
Telephone: +44 131 451 3429
Twitter: @gray_alasdair
Arrange a Meeting: http://doodle.com/agray

--

PLEASE NOTE: There may be a delay in me dealing with your email as I am 
participating in UCU industrial action by ‘working to contract’ in support of 
the union’s campaign for fair pay in higher education.
For more info go here www.ucu.org.uk/hepay13<http://www.ucu.org.uk/hepay13>






----- 
Sunday Times Scottish University of the Year 2011-2013
Top in the UK for student experience
Fourth university in the UK and top in Scotland (National Student Survey 2012)


We invite research leaders and ambitious early career researchers to 
join us in leading and driving research in key inter-disciplinary themes. 
Please see www.hw.ac.uk/researchleaders for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.

Reply via email to