Bulk loading failure

Mario Valle Wed, 04 Mar 2015 07:59:09 -0800

Dear Marmotta community,

I just started evaluating Marmotta. I installed it on Linux 3.3.0selecting KiWi as the storage backend.

As a first realistic test I tried to load the entire ChEMBL 20.0 dataset(ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/latest). This dataset isin turtle format.


But loading through the http interface (only one file at the time):

curl -sfS -X POST -H "Content-Type: text/turtle; charset=utf-8" -d@file.ttl http://localhost:8080/marmotta/import/upload

fails for the big files included in the dataset (for examplechembl_20.0_activity.ttl is 10.8 GB)


The error in marmotta.trace.db is:

03-03 09:50:45 jdbc[11]: exception

org.h2.jdbc.JdbcSQLException: Timeout trying to lock table ; SQLstatement: INSERT INTO nodes (id,ntype,svalue,createdAt) VALUES(?,'uri',?,?) [50200-178]


The very labor intensive workaround I found is:

1) convert file.ttl to file.nt (n-triples)
        using riot
2) split the resulting file into chunks:
        split -a 4 -l 20000 file.nt
3) convert each chunk to RDF
        using rdf2rdf http://www.l3s.de/~minack/rdf2rdf/
4) pass each rdf file to Marmotta using

curl -sfS -X POST -H "Content-Type: application/rdf+xml; charset=utf-8"-d @$i http://localhost:8080/marmotta/import/upload

5) wait a huge amount of time...

Any better idea for bulk loading?
BTW I don't understand where is the KiWiLoader mentioned in the wiki.

Thanks for your help!
                                mario


--
Ing. Mario Valle
Swiss National Supercomputing Centre (CSCS) | http://mariovalle.name/
v. Trevano 131, 6900 Lugano, Switzerland    | Tel: +41 (91) 610.82.60

Bulk loading failure

Reply via email to