Dear Marmotta community,
I just started evaluating Marmotta. I installed it on Linux 3.3.0
selecting KiWi as the storage backend.
As a first realistic test I tried to load the entire ChEMBL 20.0 dataset
(ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/latest). This dataset is
in turtle format.
But loading through the http interface (only one file at the time):
curl -sfS -X POST -H "Content-Type: text/turtle; charset=utf-8" -d
@file.ttl http://localhost:8080/marmotta/import/upload
fails for the big files included in the dataset (for example
chembl_20.0_activity.ttl is 10.8 GB)
The error in marmotta.trace.db is:
03-03 09:50:45 jdbc[11]: exception
org.h2.jdbc.JdbcSQLException: Timeout trying to lock table ; SQL
statement: INSERT INTO nodes (id,ntype,svalue,createdAt) VALUES
(?,'uri',?,?) [50200-178]
The very labor intensive workaround I found is:
1) convert file.ttl to file.nt (n-triples)
using riot
2) split the resulting file into chunks:
split -a 4 -l 20000 file.nt
3) convert each chunk to RDF
using rdf2rdf http://www.l3s.de/~minack/rdf2rdf/
4) pass each rdf file to Marmotta using
curl -sfS -X POST -H "Content-Type: application/rdf+xml; charset=utf-8"
-d @$i http://localhost:8080/marmotta/import/upload
5) wait a huge amount of time...
Any better idea for bulk loading?
BTW I don't understand where is the KiWiLoader mentioned in the wiki.
Thanks for your help!
mario
--
Ing. Mario Valle
Swiss National Supercomputing Centre (CSCS) | http://mariovalle.name/
v. Trevano 131, 6900 Lugano, Switzerland | Tel: +41 (91) 610.82.60