Reviving a somewhat dead thread with an update: Previously, we had run into problems with the Posgresql query planner and its use of generic query plans for prepared queries from Marmotta. We've been able to mitigate this in the past by:
- Upgrading to PSQL 9.4, which is much less aggressive about using generic query plans; and - Adding an index that directly matches the order of the problem queries. For reasons unknown to us, PSQL seemed to prefer an `spoc` index to an otherwise identical `cspo` index. As our quad store has grown, we're now seeing this solution fail, and are back to fighting against the query planner. The crux of our problem is that we use Marmotta primarily as an LDP server; this server now contains in the millions of individual LDP-NRs, and about twice as many LDP-RSs. The result is that we have approximately 10 million named graphs, each containing a small number of triples (varying from 3 to about 75), and a single LDP context with tens of millions of triples. When we do a PUT request, the LDP context is queried with a prepared query like: prepare problem_query as SELECT id FROM triples WHERE subject = $1 AND predicate = $2 AND object = $3 AND context = $4 AND deleted = false; The query planner has, at various points, chosen different indexes to use to carry out this query. The behavior we are seeing now is that it uses a `cpo` index, then filters the subjects. Presumably, it believes that scanning the smaller index will save it time, and in most cases, between zero and three subjects will need to be filtered. When the LDP context is used, however, the result is generally that at least 10 million subjects need to be filtered (this varies depending on which of the predicates from the LDP graph are queried). This process makes the query extremely expensive. Running time is counted in minutes. We've been able to persuade the query planner to use an `spo` index, which runs instantly, but only when we are not past the JDBC preparedThreshold. Our current thinking is to disable prepared statements altogether, but we expect to see a cost for this in other queries. Have you all seen this before? It looks to us like it should be repeatable for any large scale LDP marmotta service running on Postgresql, and we thought it would be important to share our experience so this can be patched on the Marmotta side. Thanks for your time, and for the work you all do. Best, Tom Johnson Metadata & Platform Architect Digital Public Library of America On Fri, Nov 6, 2015 at 3:17 AM, Sergio Fernández <wik...@apache.org> wrote: > Hi Tom, > > On Thu, Nov 5, 2015 at 6:12 PM, Tom Johnson <t...@dp.la> wrote: >> >> We can confirm that upgrading to PSQL 9.2+ (we are using 9.4) has >> significant performance benefits on LDP requests. GET requests that had >> been taking between 10 and 50 seconds are now observed at around 1/10 of a >> second, with much of that time apparently spent on the Marmotta side rather >> than in PSQL. >> > > Definitely PostgreSQL has brought quite some improvements in 9.4 that we > strongly recommend it for Marmotta. > > After the upgrade, we saw vastly improved performance for a time, but >> found that under a certain load similar query planning problems resurfaced. >> Perhaps oddly, we were able to stem this by creating a new index on the >> triples table, with a different field order from the existing four-column >> index. The new index (on `spoc` rather than `cspo`) is used more readily in >> the generic plan, and we've not seen slow queries (> 200ms) logged on LDP >> requests since introducing it. >> > > Those are really good news! > > Probably there is still room to improve the current implementation. But > also take into account that currently the LDP module uses Sesame-based > persistence with KiWi. Following the experience on providing a SQL native > SPARQL evaluation strategy, we could try something similar for improving > even more the LDP performance. But I'd prefer to keep that plan aside if > possible. > > Cheers, > > -- > Sergio Fernández > Partner Technology Manager > Redlink GmbH > m: +43 6602747925 > e: sergio.fernan...@redlink.co > w: http://redlink.co >