Re: Marmotta as a linked data platform

Sebastian Schaffert Mon, 07 Jul 2014 03:52:28 -0700

Hi,

the KiWi versioning still works for interactive access if you enable it
afterwards, but currently not for the bulk loading you describe (could be
implemented though if I find the time, but it will slow down the loading
performance to about half the speed). From experience, PostgreSQL tuning
has a dramatic impact on performance, but I cannot really say if it will
match Jena TDB (which is using its own custom data storage format optimized
for triples). Maximum bulk load performance was never the main goal when
developing the KiWi triple store, so you might be better of with TDB in
your scenario. Still, your performance stats sound pretty low. Even on a
slow machine I get at least 10.000 triples/sec (but typically much higher),
which would give you about half the time you describe. Did you add the
option to drop indexes before import (-I)?


Regarding your versioning requirement: this really sounds like named graphs
are enough in your case, since your versions are really big. The KiWi
versioning model was designed rather with the typical small user
interactions (e.g. SPARQL updates with both deletes and adds) in mind, not
with long-running transactions.

I don't have much experience with the other backends, I just implemented
interfaces so it works in test cases. No bulk loads so far. But Titan is
said to have very good performance when using a Cassandra database as
backend. If you really want to play with frequent bulk loading it might be
worth playing with it.

Sebastian


2014-07-07 12:13 GMT+02:00 Prashant <pn...@icloud.com>:

> Thank you all, your reply & time is much appreciated.
>
> For me main attraction of Kiwi as a backend was versioning but bulk loader
> does not support versioning, so I need to look at custom versioning anyway.
> May be named graph is way to go but that is for another day.
>
> In my use case bulk data loading is going to be frequent so I need fast
> data loading and on top of that versioning.  That said after doing some
> data loading exercise with Jena TDB vs kiwi with close to 30 million
> triples (which is not much) I see Jena is clear winner. It just take 6-7
> minute on a 8 G mac machine while Kiwi (backed by Postgres) takes close to
> 90 minute. I know postgres tuning like more shared buffer (current 128 mb),
>  can improve loading performance but I do not think it will match Jena TDB.
> Any thought?
>
> My another question is where we stand with Titan or Berkleydb  as a
> backend. I have both frequent write and read requirement in my use case.
> Although read will anyway surpass write but it is not write once use case...
>
> Thanks
> Prashant
>
>
>
>
>
> On Jul 7, 2014, at 10:53 AM, Sebastian Schaffert <
> sebastian.schaff...@gmail.com> wrote:
>
> Hi all,
>
> some additions to Sergio's reply:
>
>
> 2014-07-07 8:11 GMT+02:00 Sergio Fernández <wik...@apache.org>:
>>
>>
>>
>>  My main requirements are
>>>
>>> 1. Sparql Query speed like native memory storage like Jena TDB.
>>>
>>
>> We do not have performance figure about the KiWi triple store in
>> comparison win Jena TDB or other. If you get some, please share it with the
>> community to have a reference.
>>
>> Please, take into account that KiWi is just the default triple store. In
>> Marmotta you can easily use any Sesame-based triples store. Further details
>> at:
>>
>> http://marmotta.apache.org/platform/backends
>
>
> Right now, SPARQL is NOT very fast in KiWi (compared to Virtuoso, but
> still faster than in some other Sesame backend). We are still working on
> improving SPARQL performance, but since SPARQL is by its very nature a very
> expressive query language not everything can be super-fast.
>
>
>>  4. Text Search
>>>
>>
>> That feature was not coming to Marmotta from LMF (
>> http://lmf.googlecode.com). But you should be able to still use it by
>> adding this dependency to your webapp launcher:
>>
>>   <dependency>
>>     <groupId>at.newmedialab.lmf</groupId>
>>     <artifactId>lmf-search</artifactId>
>>     <version>3.2.0-SNAPSHOT</version>
>>   </dependency>
>>
>> All details at http://code.google.com/p/lmf/wiki/ModuleSemanticSearch
>
>
> This is not necessarily needed. The KiWi SPARQL implementation since last
> version contains features for full-text search as part of SPARQL queries
> when using PostgreSQL:
>
> http://marmotta.apache.org/kiwi/sparql.html
>
>
> Greetings,
>
> Sebastian
>
>
>

Re: Marmotta as a linked data platform

Reply via email to