Re: Marmotta as a linked data platform

Prashant Mon, 07 Jul 2014 04:05:26 -0700

Thanks Again Sebastian
See my response inline.
Also attaching a output of first 17million triples load. Not sure you will get 
this image file as attachment.but in case you get peak performance was close to 
14k per sec. My load was in RDF/XML. Other format will get much quicker load.



Thanks
Prashant





On Jul 7, 2014, at 11:51 AM, Sebastian Schaffert 
<sebastian.schaff...@gmail.com> wrote:

> Hi,
> 
> the KiWi versioning still works for interactive access if you enable it 
> afterwards, but currently not for the bulk loading you describe (could be 
> implemented though if I find the time, but it will slow down the loading 
> performance to about half the speed). From experience, PostgreSQL tuning has 
> a dramatic impact on performance, but I cannot really say if it will match 
> Jena TDB (which is using its own custom data storage format optimized for 
> triples). Maximum bulk load performance was never the main goal when 
> developing the KiWi triple store, so you might be better of with TDB in your 
> scenario. Still, your performance stats sound pretty low. Even on a slow 
> machine I get at least 10.000 triples/sec (but typically much higher), which 
> would give you about half the time you describe. Did you add the option to 
> drop indexes before import (-I)?

Yes I did add drop index option and I see load was quicker.
> 
> Regarding your versioning requirement: this really sounds like named graphs 
> are enough in your case, since your versions are really big. The KiWi 
> versioning model was designed rather with the typical small user interactions 
> (e.g. SPARQL updates with both deletes and adds) in mind, not with 
> long-running transactions.
thanks this was good insight for my decision making.
> 
> I don't have much experience with the other backends, I just implemented 
> interfaces so it works in test cases. No bulk loads so far. But Titan is said 
> to have very good performance when using a Cassandra database as backend. If 
> you really want to play with frequent bulk loading it might be worth playing 
> with it.
> 
Yes I was reading Titan non RDF based graph experiment a lot particularly their 
Pearson case study. Probably will check back later.

> 
> 
> 
> 2014-07-07 12:13 GMT+02:00 Prashant <pn...@icloud.com>:
> Thank you all, your reply & time is much appreciated. 
> 
> For me main attraction of Kiwi as a backend was versioning but bulk loader 
> does not support versioning, so I need to look at custom versioning anyway. 
> May be named graph is way to go but that is for another day. 
> 
> In my use case bulk data loading is going to be frequent so I need fast data 
> loading and on top of that versioning.  That said after doing some data 
> loading exercise with Jena TDB vs kiwi with close to 30 million triples 
> (which is not much) I see Jena is clear winner. It just take 6-7 minute on a 
> 8 G mac machine while Kiwi (backed by Postgres) takes close to 90 minute. I 
> know postgres tuning like more shared buffer (current 128 mb),  can improve 
> loading performance but I do not think it will match Jena TDB. Any thought?
> 
> My another question is where we stand with Titan or Berkleydb  as a backend. 
> I have both frequent write and read requirement in my use case. Although read 
> will anyway surpass write but it is not write once use case...
> 
> Thanks
> Prashant
> 
> 
> 
> 
> 
> On Jul 7, 2014, at 10:53 AM, Sebastian Schaffert 
> <sebastian.schaff...@gmail.com> wrote:
> 
>> Hi all, 
>> 
>> some additions to Sergio's reply:
>> 
>> 
>> 2014-07-07 8:11 GMT+02:00 Sergio Fernández <wik...@apache.org>:
>> 
>> 
>> My main requirements are
>> 
>> 1. Sparql Query speed like native memory storage like Jena TDB.
>> 
>> We do not have performance figure about the KiWi triple store in comparison 
>> win Jena TDB or other. If you get some, please share it with the community 
>> to have a reference.
>> 
>> Please, take into account that KiWi is just the default triple store. In 
>> Marmotta you can easily use any Sesame-based triples store. Further details 
>> at:
>> 
>> http://marmotta.apache.org/platform/backends
>> 
>> Right now, SPARQL is NOT very fast in KiWi (compared to Virtuoso, but still 
>> faster than in some other Sesame backend). We are still working on improving 
>> SPARQL performance, but since SPARQL is by its very nature a very expressive 
>> query language not everything can be super-fast.
>>  
>> 4. Text Search
>> 
>> That feature was not coming to Marmotta from LMF 
>> (http://lmf.googlecode.com). But you should be able to still use it by 
>> adding this dependency to your webapp launcher:
>> 
>>   <dependency>
>>     <groupId>at.newmedialab.lmf</groupId>
>>     <artifactId>lmf-search</artifactId>
>>     <version>3.2.0-SNAPSHOT</version>
>>   </dependency>
>> 
>> All details at http://code.google.com/p/lmf/wiki/ModuleSemanticSearch
>> 
>> This is not necessarily needed. The KiWi SPARQL implementation since last 
>> version contains features for full-text search as part of SPARQL queries 
>> when using PostgreSQL:
>> 
>> http://marmotta.apache.org/kiwi/sparql.html
>> 
>> 
>> Greetings,
>> 
>> Sebastian 
> 
>

Re: Marmotta as a linked data platform

Reply via email to