Re: Project Status...

Alan Snyder Fri, 01 Mar 2019 12:50:28 -0800

There's some suggestions for tuning the PG instance...

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server



https://www.postgresql.org/docs/current/sql-altersystem.html

I haven't gotten to try these out yet, but it might help... let me know if
you find anything!



On Fri, Mar 1, 2019 at 3:40 PM JOSE ENRIQUE ORTIZ VIVAR <
jose.ort...@ucuenca.edu.ec> wrote:

> Hi all.
>
> I have faced similar performance issues with my Marmotta 3.4.0 setup using
> Postgres 9.3. My dataset had almost 30M triples divided into 10 graphs and
> querying turned into an  extremely slow operation in some cases. In order
> to deal with it I manually added some indexes in the database schema,
> specifically on the triples table.
>
> By default, Marmotta creates three indexes (P, SPO, CSPO) [1]. And adding
> other indexes (i.g CPO, CSP) improves performance for certain query
> patterns. However, bear in mind that this approach has some caveats like:
> increase in writing time (inserts) and disk space usage.
>
> my two cents,
>
> Cheers,
>
> José
>
>
> [1]
> https://github.com/apache/marmotta/blob/master/libraries/kiwi/kiwi-triplestore/src/main/resources/org/apache/marmotta/kiwi/persistence/pgsql/create_base_tables.sql#L84
>
> El vie., 1 mar. 2019 a las 14:18, Alan Snyder (<alan8...@gmail.com>)
> escribió:
>
>> Hi Sebastian - thanks for following up... I wound up cloning the repo and
>> building Marmotta myself to debug my issues. I finally got it running
>> against Postgres - with a JDBC driver update - and all seems well so far
>> (see my post to the dev list).
>>
>> I read up on Ostrich and it seems like a wildly performant back-end. I'd
>> consider using it if performance does become an issue, but for now Postgres
>> suffices.
>>
>> My SPARQL performance issues noted earlier weren't exactly fair tests
>> since they were on H2 and my hacked mysql connection. Now that I'm on
>> Postgres, performance is much better. And I've tuned the JVM a bit more to
>> give more XmX RAM, and I've shut off LDP caching as we don't need that.
>>
>> I think we can use this project for our needs at this point. However my
>> only issue is with import times - I have a 30Mb TTL file with about 400k
>> triples, and it seems to take about 2+ minutes to rip through. I do have
>> versioning enabled, but even without it take the same amount of time. Is
>> there anything I can do to speed this up? It almost appears that the more
>> triples I already have loaded into Marmotta (right now about 5M under 10
>> contexts), the longer it takes to do a new import.
>>
>> Other than the above, I think this is a great system - very configurable
>> and I'm looking forward to exploring more!
>>
>> -- Alan
>>
>>
>> On Fri, Mar 1, 2019 at 1:02 PM Sebastian Schaffert <
>> sebastian.schaff...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I started working on migrating to a couple of new libraries (most
>>> importantly: Sesame 4), but I didn't push those changes yet as they are
>>> pretty major. So the project isn't dead, and moving to Sesame 4 might
>>> actually improve a couple of things :)
>>>
>>> SPARQL on H2 is not recommended, we only default to H2 because it offers
>>> users a quick way to get started. If you want to get better performance,
>>> I'd suggest you use PostgreSQL. The way we implement SPARQL is by
>>> translating it into equivalent SQL and expecting the database query planner
>>> to run it efficiently. PostgreSQL has a much better query planner than H2
>>> or MySQL for the kind of (graph) queries SPARQL generates. Another option
>>> is to switch to the (experimental) Ostrich backend. It's not based on a
>>> relational database and heavily optimized for large amounts of data, and
>>> especially simple SPARQL queries will be much faster. But it's harder to
>>> set up, and you'll need to compile the C++ backend for your platform.
>>>
>>> Can you file a couple of bugs for your feature requests?
>>>
>>> Sebastian
>>>
>>> Am Do., 28. Feb. 2019 um 08:08 Uhr schrieb Alan Snyder <
>>> alan8...@gmail.com>:
>>>
>>>> Hi Noor,
>>>>
>>>> Thanks for reaching out! Our needs are to be able to store multiple
>>>> Datasets (fragments), with versioning, transactions, SPARQL queries, REST
>>>> endpoints for at least querying and updating, and some kind of persistence
>>>> back-end we can back-up safely. Basically, we need git for RDF :)
>>>>
>>>> Marmotta fit the bill here and especially the versioning aspect was
>>>> very attractive for us. However performance of SPARQL with our initial
>>>> dataset (about 500k triples) was pretty bad (few seconds to a few minutes,
>>>> depending). I'm new to SPARQL so maybe I wasn't optimizing the query as it
>>>> should have been, but on other systems like GraphDb (granted - not an LDP
>>>> platform), the same queries ran in sub-second timings. As I loaded more
>>>> datasets, maybe totaling 3M triples, the response time was well into the
>>>> minutes for things which should've been very fast, imho. I got frustrated
>>>> after seeing a lot of java exceptions in the log file, and figured that
>>>> maybe H2 wasn't the best for this dataset size. I tried MySQL and
>>>> encountered an error where the 'triples' table wasn't created. I fixed that
>>>> but queries were still long-running. I then switched to postgresql and got
>>>> more errors about transactions I think, and at that point I gave up and
>>>> felt that it needed more time to get setup right. I may come back to it but
>>>> I need to move forward with something, so I'm looking at other options,
>>>> along side learning RDF and SPARQL. Again, I may circle back to Marmotta,
>>>> and likely will just to exhaust it as an option, but out fo the box, it
>>>> didn't seem like it was usable.
>>>>
>>>> I'll take a look at your paper - that's interesting, Might be a bit
>>>> overkill for our needs though.
>>>>
>>>> Thanks again,
>>>> Alan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Feb 28, 2019 at 2:01 AM Mohammad Noorani Bakerally <
>>>> noorani.bakera...@gmail.com> wrote:
>>>>
>>>>> Hi Alan,
>>>>>
>>>>> What type of LDP server are you looking for ? Do you need read
>>>>> capability or write capability as well ? Because, in the last european
>>>>> semantic web conference, I presented an approach for automating generating
>>>>> LDPs from existing data sources. The LDPs can be hosted on any compatible
>>>>> LDP server. For the purpose of the demonstration, I implemented a 
>>>>> READ-only
>>>>> LDP Server,  here is a short paper (
>>>>> https://www.emse.fr/~zimmermann/Papers/eswc2018demo1.pdf) of 4 pages
>>>>> describing the approach for generating the LDP and its deployment on the
>>>>> server I developed. This approach was used to set up over 200 LDPs.
>>>>>
>>>>> If you intend to use this approach, feel free to get into contact, i
>>>>> can help setting it up,
>>>>>
>>>>> thanks,
>>>>> Noor
>>>>>
>>>>>
>>>>> On Thu, Feb 28, 2019 at 3:49 AM Alan Snyder <alan8...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the info... we may be looking to use Marmotta for a
>>>>>> project but was concerned about some the bugs I've encountered out of the
>>>>>> box. For example, mysql doesn't seem to have a 'triples' table when I
>>>>>> select that for the KiWi back-end... I had to create this manually. And
>>>>>> even the Postgres setup gives some error I tried to track down. All in 
>>>>>> all,
>>>>>> the feature set here is very close to what my needs are, but the above, 
>>>>>> and
>>>>>> some performance issues with SPARQL when I tested it made me shy away. 
>>>>>> I'm
>>>>>> willing to work on it to get things going, but I do have to consider 
>>>>>> other
>>>>>> alternatives, at least other graph db's, if not full blown LDP platforms.
>>>>>>
>>>>>> I'd love to hear more updates and chatter here (and on the dev
>>>>>> channel). I think going to RDF4J is a great start too - and maybe I can 
>>>>>> try
>>>>>> to fix the mysql and Postgres issues once I track down where those DDL
>>>>>> files are.
>>>>>>
>>>>>> Thanks again!
>>>>>> Alan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 27, 2019 at 3:49 PM Xavier Sumba <
>>>>>> xavier.sumb...@ucuenca.edu.ec> wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> We can propose something for GSoC 2019 with the aims of reviving the
>>>>>>> project and attract contributors. What do you think Jakob?
>>>>>>>
>>>>>>> I can try to finish the migration to RDF4J if I find some free time.
>>>>>>> :D
>>>>>>>
>>>>>>> In the meantime Alan, if you'd like to contribute with something the
>>>>>>> migration is a good start [1], and IMHO, it was a matter of moving to 
>>>>>>> the
>>>>>>> version 3.4.0 and solving some naming conventions. I think for now we 
>>>>>>> can
>>>>>>> avoid the experimental backends.
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Xavier
>>>>>>>
>>>>>>> [1] https://github.com/apache/marmotta/pull/31
>>>>>>>
>>>>>>>
>>>>>>> > On Feb 27, 2019, at 15:22, Jakob Frank <jakob.fr...@redlink.co>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hi Alan,
>>>>>>> >
>>>>>>> > I'd say dormant, not dead.
>>>>>>> >
>>>>>>> > To be frank: Development activities have been rather low in the
>>>>>>> past months.
>>>>>>> > Any help is appreciated, so if you'd like to contribute something
>>>>>>> you
>>>>>>> > are more than welcome!
>>>>>>> >
>>>>>>> > Best,
>>>>>>> > Jakob
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, 27 Feb 2019 at 15:27, Alan Snyder <alan8...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Thanks Aaron.. just to be clear.. Is Marmotta a dead project now?
>>>>>>> There was just a release in June for 3.4.0 so I hoped there'd be some
>>>>>>> momentum.. any plans for future work here?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Wed, Feb 27, 2019, 9:03 AM Aaron Coburn <acob...@amherst.edu>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> Hi,
>>>>>>> >>> if you are looking for an LDP server, the Apache Annotator
>>>>>>> project (Web Annotation Protocol sits atop LDP) has a list of
>>>>>>> implementations on this page:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> https://github.com/apache/incubator-annotator/wiki/LDP-and-Web-Annotation-Protocol-Implementations
>>>>>>> >>>
>>>>>>> >>> Other than Virtuoso, which is a commercial product, all of the
>>>>>>> projects listed are Apache 2 licensed.
>>>>>>> >>>
>>>>>>> >>> -Aaron
>>>>>>> >>>
>>>>>>> >>> On Tue, Feb 26, 2019 at 3:20 PM Alan Snyder <alan8...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Hi, just wondering if this project is still active? I don't see
>>>>>>> any activity in the mailing list archives. Is there another venue for
>>>>>>> communication with the team? And if the project isn't worked on 
>>>>>>> routinely,
>>>>>>> is there another platform recommended to use with similar features /
>>>>>>> license?
>>>>>>> >>>>
>>>>>>> >>>> Thanks!
>>>>>>> >>>> Alan
>>>>>>> >>>>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Jakob Frank
>>>>>>> > | http://redlink.at
>>>>>>> > | m: +43 699 10588742 | e: jakob.fr...@redlink.at
>>>>>>> > | http://at.linkedin.com/in/jakobfrank
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Advertencia legal:
>>>>>>> Este mensaje y, en su caso, los archivos anexos son
>>>>>>> confidenciales, especialmente en lo que respecta a los datos
>>>>>>> personales, y
>>>>>>> se dirigen exclusivamente al destinatario referenciado. Si usted no
>>>>>>> lo es y
>>>>>>> lo ha recibido por error o tiene conocimiento del mismo por
>>>>>>> cualquier
>>>>>>> motivo, le rogamos que nos lo comunique por este medio y proceda a
>>>>>>> destruirlo o borrarlo, y que en todo caso se abstenga de utilizar,
>>>>>>> reproducir, alterar, archivar o comunicar a terceros el presente
>>>>>>> mensaje y
>>>>>>> ficheros anexos, todo ello bajo pena de incurrir en
>>>>>>> responsabilidades
>>>>>>> legales. Las opiniones contenidas en este mensaje y en los archivos
>>>>>>> adjuntos, pertenecen exclusivamente a su remitente y no representan
>>>>>>> la
>>>>>>> opinión de la Universidad de Cuenca salvo que se diga expresamente y
>>>>>>> el
>>>>>>> remitente esté autorizado para ello. El emisor no garantiza la
>>>>>>> integridad,
>>>>>>> rapidez o seguridad del presente correo, ni se responsabiliza de
>>>>>>> posibles
>>>>>>> perjuicios derivados de la captura, incorporaciones de virus o
>>>>>>> cualesquiera
>>>>>>> otras manipulaciones efectuadas por terceros.
>>>>>>>
>>>>>>
> Advertencia legal:
> Este mensaje y, en su caso, los archivos anexos son confidenciales,
> especialmente en lo que respecta a los datos personales, y se dirigen
> exclusivamente al destinatario referenciado. Si usted no lo es y lo ha
> recibido por error o tiene conocimiento del mismo por cualquier motivo, le
> rogamos que nos lo comunique por este medio y proceda a destruirlo o
> borrarlo, y que en todo caso se abstenga de utilizar, reproducir, alterar,
> archivar o comunicar a terceros el presente mensaje y ficheros anexos, todo
> ello bajo pena de incurrir en responsabilidades legales. Las opiniones
> contenidas en este mensaje y en los archivos adjuntos, pertenecen
> exclusivamente a su remitente y no representan la opinión de la Universidad
> de Cuenca salvo que se diga expresamente y el remitente esté autorizado
> para ello. El emisor no garantiza la integridad, rapidez o seguridad del
> presente correo, ni se responsabiliza de posibles perjuicios derivados de
> la captura, incorporaciones de virus o cualesquiera otras manipulaciones
> efectuadas por terceros.
>

Re: Project Status...

Reply via email to