Dear Marmotta community,

I just started evaluating Marmotta. I installed it on Linux 3.3.0 selecting KiWi as the storage backend.

As a first realistic test I tried to load the entire ChEMBL 20.0 dataset (ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/latest). This dataset is in turtle format.

But loading through the http interface (only one file at the time):

curl -sfS -X POST -H "Content-Type: text/turtle; charset=utf-8" -d @file.ttl http://localhost:8080/marmotta/import/upload

fails for the big files included in the dataset (for example chembl_20.0_activity.ttl is 10.8 GB)

The error in marmotta.trace.db is:

03-03 09:50:45 jdbc[11]: exception
org.h2.jdbc.JdbcSQLException: Timeout trying to lock table ; SQL statement: INSERT INTO nodes (id,ntype,svalue,createdAt) VALUES (?,'uri',?,?) [50200-178]

The very labor intensive workaround I found is:

1) convert file.ttl to file.nt (n-triples)
        using riot
2) split the resulting file into chunks:
        split -a 4 -l 20000 file.nt
3) convert each chunk to RDF
        using rdf2rdf http://www.l3s.de/~minack/rdf2rdf/
4) pass each rdf file to Marmotta using
curl -sfS -X POST -H "Content-Type: application/rdf+xml; charset=utf-8" -d @$i http://localhost:8080/marmotta/import/upload
5) wait a huge amount of time...

Any better idea for bulk loading?
BTW I don't understand where is the KiWiLoader mentioned in the wiki.

Thanks for your help!
                                mario


--
Ing. Mario Valle
Swiss National Supercomputing Centre (CSCS) | http://mariovalle.name/
v. Trevano 131, 6900 Lugano, Switzerland    | Tel: +41 (91) 610.82.60

Reply via email to