Re: Project Status...

JOSE ENRIQUE ORTIZ VIVAR Fri, 01 Mar 2019 12:40:42 -0800

Hi all.

I have faced similar performance issues with my Marmotta 3.4.0 setup using
Postgres 9.3. My dataset had almost 30M triples divided into 10 graphs and
querying turned into an  extremely slow operation in some cases. In order
to deal with it I manually added some indexes in the database schema,
specifically on the triples table.


By default, Marmotta creates three indexes (P, SPO, CSPO) [1]. And adding
other indexes (i.g CPO, CSP) improves performance for certain query
patterns. However, bear in mind that this approach has some caveats like:
increase in writing time (inserts) and disk space usage.

my two cents,

Cheers,

José


[1]
https://github.com/apache/marmotta/blob/master/libraries/kiwi/kiwi-triplestore/src/main/resources/org/apache/marmotta/kiwi/persistence/pgsql/create_base_tables.sql#L84

El vie., 1 mar. 2019 a las 14:18, Alan Snyder (<alan8...@gmail.com>)
escribió:

> Hi Sebastian - thanks for following up... I wound up cloning the repo and
> building Marmotta myself to debug my issues. I finally got it running
> against Postgres - with a JDBC driver update - and all seems well so far
> (see my post to the dev list).
>
> I read up on Ostrich and it seems like a wildly performant back-end. I'd
> consider using it if performance does become an issue, but for now Postgres
> suffices.
>
> My SPARQL performance issues noted earlier weren't exactly fair tests
> since they were on H2 and my hacked mysql connection. Now that I'm on
> Postgres, performance is much better. And I've tuned the JVM a bit more to
> give more XmX RAM, and I've shut off LDP caching as we don't need that.
>
> I think we can use this project for our needs at this point. However my
> only issue is with import times - I have a 30Mb TTL file with about 400k
> triples, and it seems to take about 2+ minutes to rip through. I do have
> versioning enabled, but even without it take the same amount of time. Is
> there anything I can do to speed this up? It almost appears that the more
> triples I already have loaded into Marmotta (right now about 5M under 10
> contexts), the longer it takes to do a new import.
>
> Other than the above, I think this is a great system - very configurable
> and I'm looking forward to exploring more!
>
> -- Alan
>
>
> On Fri, Mar 1, 2019 at 1:02 PM Sebastian Schaffert <
> sebastian.schaff...@gmail.com> wrote:
>
>> Hi,
>>
>> I started working on migrating to a couple of new libraries (most
>> importantly: Sesame 4), but I didn't push those changes yet as they are
>> pretty major. So the project isn't dead, and moving to Sesame 4 might
>> actually improve a couple of things :)
>>
>> SPARQL on H2 is not recommended, we only default to H2 because it offers
>> users a quick way to get started. If you want to get better performance,
>> I'd suggest you use PostgreSQL. The way we implement SPARQL is by
>> translating it into equivalent SQL and expecting the database query planner
>> to run it efficiently. PostgreSQL has a much better query planner than H2
>> or MySQL for the kind of (graph) queries SPARQL generates. Another option
>> is to switch to the (experimental) Ostrich backend. It's not based on a
>> relational database and heavily optimized for large amounts of data, and
>> especially simple SPARQL queries will be much faster. But it's harder to
>> set up, and you'll need to compile the C++ backend for your platform.
>>
>> Can you file a couple of bugs for your feature requests?
>>
>> Sebastian
>>
>> Am Do., 28. Feb. 2019 um 08:08 Uhr schrieb Alan Snyder <
>> alan8...@gmail.com>:
>>
>>> Hi Noor,
>>>
>>> Thanks for reaching out! Our needs are to be able to store multiple
>>> Datasets (fragments), with versioning, transactions, SPARQL queries, REST
>>> endpoints for at least querying and updating, and some kind of persistence
>>> back-end we can back-up safely. Basically, we need git for RDF :)
>>>
>>> Marmotta fit the bill here and especially the versioning aspect was very
>>> attractive for us. However performance of SPARQL with our initial dataset
>>> (about 500k triples) was pretty bad (few seconds to a few minutes,
>>> depending). I'm new to SPARQL so maybe I wasn't optimizing the query as it
>>> should have been, but on other systems like GraphDb (granted - not an LDP
>>> platform), the same queries ran in sub-second timings. As I loaded more
>>> datasets, maybe totaling 3M triples, the response time was well into the
>>> minutes for things which should've been very fast, imho. I got frustrated
>>> after seeing a lot of java exceptions in the log file, and figured that
>>> maybe H2 wasn't the best for this dataset size. I tried MySQL and
>>> encountered an error where the 'triples' table wasn't created. I fixed that
>>> but queries were still long-running. I then switched to postgresql and got
>>> more errors about transactions I think, and at that point I gave up and
>>> felt that it needed more time to get setup right. I may come back to it but
>>> I need to move forward with something, so I'm looking at other options,
>>> along side learning RDF and SPARQL. Again, I may circle back to Marmotta,
>>> and likely will just to exhaust it as an option, but out fo the box, it
>>> didn't seem like it was usable.
>>>
>>> I'll take a look at your paper - that's interesting, Might be a bit
>>> overkill for our needs though.
>>>
>>> Thanks again,
>>> Alan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Feb 28, 2019 at 2:01 AM Mohammad Noorani Bakerally <
>>> noorani.bakera...@gmail.com> wrote:
>>>
>>>> Hi Alan,
>>>>
>>>> What type of LDP server are you looking for ? Do you need read
>>>> capability or write capability as well ? Because, in the last european
>>>> semantic web conference, I presented an approach for automating generating
>>>> LDPs from existing data sources. The LDPs can be hosted on any compatible
>>>> LDP server. For the purpose of the demonstration, I implemented a READ-only
>>>> LDP Server,  here is a short paper (
>>>> https://www.emse.fr/~zimmermann/Papers/eswc2018demo1.pdf) of 4 pages
>>>> describing the approach for generating the LDP and its deployment on the
>>>> server I developed. This approach was used to set up over 200 LDPs.
>>>>
>>>> If you intend to use this approach, feel free to get into contact, i
>>>> can help setting it up,
>>>>
>>>> thanks,
>>>> Noor
>>>>
>>>>
>>>> On Thu, Feb 28, 2019 at 3:49 AM Alan Snyder <alan8...@gmail.com> wrote:
>>>>
>>>>> Thanks for the info... we may be looking to use Marmotta for a project
>>>>> but was concerned about some the bugs I've encountered out of the box. For
>>>>> example, mysql doesn't seem to have a 'triples' table when I select that
>>>>> for the KiWi back-end... I had to create this manually. And even the
>>>>> Postgres setup gives some error I tried to track down. All in all, the
>>>>> feature set here is very close to what my needs are, but the above, and
>>>>> some performance issues with SPARQL when I tested it made me shy away. I'm
>>>>> willing to work on it to get things going, but I do have to consider other
>>>>> alternatives, at least other graph db's, if not full blown LDP platforms.
>>>>>
>>>>> I'd love to hear more updates and chatter here (and on the dev
>>>>> channel). I think going to RDF4J is a great start too - and maybe I can 
>>>>> try
>>>>> to fix the mysql and Postgres issues once I track down where those DDL
>>>>> files are.
>>>>>
>>>>> Thanks again!
>>>>> Alan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 27, 2019 at 3:49 PM Xavier Sumba <
>>>>> xavier.sumb...@ucuenca.edu.ec> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> We can propose something for GSoC 2019 with the aims of reviving the
>>>>>> project and attract contributors. What do you think Jakob?
>>>>>>
>>>>>> I can try to finish the migration to RDF4J if I find some free time.
>>>>>> :D
>>>>>>
>>>>>> In the meantime Alan, if you'd like to contribute with something the
>>>>>> migration is a good start [1], and IMHO, it was a matter of moving to the
>>>>>> version 3.4.0 and solving some naming conventions. I think for now we can
>>>>>> avoid the experimental backends.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Xavier
>>>>>>
>>>>>> [1] https://github.com/apache/marmotta/pull/31
>>>>>>
>>>>>>
>>>>>> > On Feb 27, 2019, at 15:22, Jakob Frank <jakob.fr...@redlink.co>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi Alan,
>>>>>> >
>>>>>> > I'd say dormant, not dead.
>>>>>> >
>>>>>> > To be frank: Development activities have been rather low in the
>>>>>> past months.
>>>>>> > Any help is appreciated, so if you'd like to contribute something
>>>>>> you
>>>>>> > are more than welcome!
>>>>>> >
>>>>>> > Best,
>>>>>> > Jakob
>>>>>> >
>>>>>> >
>>>>>> > On Wed, 27 Feb 2019 at 15:27, Alan Snyder <alan8...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Thanks Aaron.. just to be clear.. Is Marmotta a dead project now?
>>>>>> There was just a release in June for 3.4.0 so I hoped there'd be some
>>>>>> momentum.. any plans for future work here?
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Feb 27, 2019, 9:03 AM Aaron Coburn <acob...@amherst.edu>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> Hi,
>>>>>> >>> if you are looking for an LDP server, the Apache Annotator
>>>>>> project (Web Annotation Protocol sits atop LDP) has a list of
>>>>>> implementations on this page:
>>>>>> >>>
>>>>>> >>>
>>>>>> https://github.com/apache/incubator-annotator/wiki/LDP-and-Web-Annotation-Protocol-Implementations
>>>>>> >>>
>>>>>> >>> Other than Virtuoso, which is a commercial product, all of the
>>>>>> projects listed are Apache 2 licensed.
>>>>>> >>>
>>>>>> >>> -Aaron
>>>>>> >>>
>>>>>> >>> On Tue, Feb 26, 2019 at 3:20 PM Alan Snyder <alan8...@gmail.com>
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, just wondering if this project is still active? I don't see
>>>>>> any activity in the mailing list archives. Is there another venue for
>>>>>> communication with the team? And if the project isn't worked on 
>>>>>> routinely,
>>>>>> is there another platform recommended to use with similar features /
>>>>>> license?
>>>>>> >>>>
>>>>>> >>>> Thanks!
>>>>>> >>>> Alan
>>>>>> >>>>
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Jakob Frank
>>>>>> > | http://redlink.at
>>>>>> > | m: +43 699 10588742 | e: jakob.fr...@redlink.at
>>>>>> > | http://at.linkedin.com/in/jakobfrank
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Advertencia legal:
>>>>>> Este mensaje y, en su caso, los archivos anexos son
>>>>>> confidenciales, especialmente en lo que respecta a los datos
>>>>>> personales, y
>>>>>> se dirigen exclusivamente al destinatario referenciado. Si usted no
>>>>>> lo es y
>>>>>> lo ha recibido por error o tiene conocimiento del mismo por cualquier
>>>>>> motivo, le rogamos que nos lo comunique por este medio y proceda a
>>>>>> destruirlo o borrarlo, y que en todo caso se abstenga de utilizar,
>>>>>> reproducir, alterar, archivar o comunicar a terceros el presente
>>>>>> mensaje y
>>>>>> ficheros anexos, todo ello bajo pena de incurrir en responsabilidades
>>>>>> legales. Las opiniones contenidas en este mensaje y en los archivos
>>>>>> adjuntos, pertenecen exclusivamente a su remitente y no representan
>>>>>> la
>>>>>> opinión de la Universidad de Cuenca salvo que se diga expresamente y
>>>>>> el
>>>>>> remitente esté autorizado para ello. El emisor no garantiza la
>>>>>> integridad,
>>>>>> rapidez o seguridad del presente correo, ni se responsabiliza de
>>>>>> posibles
>>>>>> perjuicios derivados de la captura, incorporaciones de virus o
>>>>>> cualesquiera
>>>>>> otras manipulaciones efectuadas por terceros.
>>>>>>
>>>>>

-- 
Advertencia legal: 
Este mensaje y, en su caso, los archivos anexos son 
confidenciales, especialmente en lo que respecta a los datos personales, y 
se dirigen exclusivamente al destinatario referenciado. Si usted no lo es y 
lo ha recibido por error o tiene conocimiento del mismo por cualquier 
motivo, le rogamos que nos lo comunique por este medio y proceda a 
destruirlo o borrarlo, y que en todo caso se abstenga de utilizar, 
reproducir, alterar, archivar o comunicar a terceros el presente mensaje y 
ficheros anexos, todo ello bajo pena de incurrir en responsabilidades 
legales. Las opiniones contenidas en este mensaje y en los archivos 
adjuntos, pertenecen exclusivamente a su remitente y no representan la 
opinión de la Universidad de Cuenca salvo que se diga expresamente y el 
remitente esté autorizado para ello. El emisor no garantiza la integridad, 
rapidez o seguridad del presente correo, ni se responsabiliza de posibles 
perjuicios derivados de la captura, incorporaciones de virus o cualesquiera 
otras manipulaciones efectuadas por terceros.

Re: Project Status...

Reply via email to