Hi Fabian, On 19/09/14 14:26, Fabian Cretton wrote:
I did 'import' data from a file's URL, and it seems that LD cache does try to update this file regularly, is that right ? if yes, is it something described somewhere as it is, to me, different from what I read about LD cache.
That's the expected behavior of LDCache, right. The cache transparently updates the resources according the different endpoint configures (the raw Linked Data endpoint is usually the fallback always).
For instance, I did import: http://sws.geonames.org/2658434/about.rdf I did import the file in its own context, which I also called "http://sws.geonames.org/2658434/about.rdf" And the next days, I find in the logs: 09:09:27.587 INFO o.a.m.l.s.p.AbstractHttpProvider - retrieving resource data for http://sws.geonames.org/2658434/about.rdf from 'Linked Data' endpoint, request URI is <http://sws.geonames.org/2658434/about.rdf> 09:09:27.939 INFO o.a.m.l.s.p.AbstractHttpProvider - retrieved 7 triples for resource http://sws.geonames.org/2658434/about.rdf; expiry date: Fri Sep 19 09:09:27 CEST 2014
>
(Did I do something else than just an 'import', that did trigger that functionality ? I don't think so, but I might have) I am very surprised of this functionality (and pleased :-)).But I am not sure it works correctly, and here are a few questions:
>
- When importing from a file into a context, do Marmotta automatically keep information about that import and its provenance (url), and then regularly try to update the content even if not asked to do that ?
No yes, but planned: https://issues.apache.org/jira/browse/MARMOTTA-146
- If so, is it a standard functionality for the 'import'-'URL' functionality ? or even for the 'import'-'file' (i.e. when a local file is updated on disk, it is uploaded in the store' ?
The cache just work as resource level, does matter how the data initially came in. So it wouldn't try to get again the file itself,but the resources described by the file, does not matter if the file came by URL or locally uploaded.
- If the file is not loaded in it own context, but mixed with other triples in an existing context, then how can Marmotta handle this update ? (knowing which existing triples to remove if they were removed from the source, etc.) - is it possible to activate/deactivate this functionality
Yes, by defining a backlist endpoint for LDCache. Further details aT http://marmotta.apache.org/platform/ldcache-module.html
- why 'retrieved 7 triples'...whereas the context that contains that file does have 141 triples ? is this a bug ? or does the algorithm try to retrieve only 'modified' triples with the file ?
That's strange, yes. Internally LDCache would be using something like: https://gist.github.com/wikier/728e234bb998158bf9ec
I've just included as a test: https://github.com/apache/marmotta/blob/b24553cdc877e5f39361c4dd7f0994b46b3ad707/libraries/ldclient/ldclient-provider-rdf/src/test/java/org/apache/marmotta/ldclient/test/rdf/TestLinkedDataProvider.java#L72
And it actually retrieves 7 triples. I'd need to debug why.
- is it already implemented to achieve the same functionality with RDFa (for instance, or any data that can be retrieved by a LDClient) ? Pointing to a web page that contains RDFa, retrieving its RDF content, and update it on a regular basis when the original page's content changes ?
Yes, you just need to register a LDCache endpoint using the RDFa data provider.
Hope that helps. Cheers, -- Sergio Fernández Partner Technology Manager Redlink GmbH m: +43 660 2747 925 e: sergio.fernan...@redlink.co w: http://redlink.co