Re: Scaling Marmotta's LDP interface

Mark Breedlove Wed, 23 Sep 2015 15:54:07 -0700

On Tue, Sep 22, 2015 at 2:22 AM, Sergio Fernández <wik...@apache.org> wrote:


>
> take into account that by default PostgreSQL is very very conservative.
> Have you customized it?
>

Thanks for your help with this!

Yes, we've done quite a bit of tuning, as described in [1], [3], and [4].

* max_connections: default is 100, 1000 is at least recommended, but goes
> together with the available memory (see next)
>

Our Marmotta tends to use just 10 connections, and we have max_connections
set to 150 for the server, which contains a few other databases in addition
to Marmotta's that have very light usage. The only time we've hit the
max_connections limit was during a period when locks were not being
released. During a normal mapping or enrichment activity, even with the
problems I've mentioned, we seldom go above 10 connections for the Marmotta
database, so I'm hesitant to raise the potential memory usage of the
system, which would result in paging. I'm focused at the moment on stopping
the problem of the locks being held for so long.


> * shared_buffers: default is just 32MB, as much as you give the more
> postres would not need to touch disk for any operation (we have
> installations with 16GB behaving really well)
>

We have this set to 40% of our real memory -- 3072MB on a 7.5GB server.
The 40% ratio is based on a statement in the postgresql.org wiki[2], which
states "given the way PostgreSQL also relies on the operating system cache,
it's unlikely you'll find using more than 40% of RAM to work better than a
smaller amount."  Would you agree with the basic ratio, or would you
suggest raising it?

The table for triples alone is 32GB, so I could see adding more memory and
increasing shared_buffers, but I'm not sure if I'd ever be able to add
enough to cover the expanding size of the table.


> * work_mem: default is 4MB, you can dramatically increase it (128 or 256
> MB) to improve each each transaction handling
>
* maintenance_work_mem: is less critical, but useful on maintenance tasks
> you periodically should run (see below)
>

work_mem is 38MB at the moment.  I'll try increasing it and will let you
know what I encounter.  maintenance_work_mem is 100MB.


>

> * checkpoint_segments: by default is just 3, but a much larger value
> improves transaction handling
>

checkpoint_segments is 128, and checkpoint_timeout is 20min[3].


> * think about your vacuum strategy
>

We've adjusted our autovacuuming per  [1].


> * If you have resources, create all the indexes (cspo, cop, cp, literals,
> etc) you may need to improve performance
>

Could you point me in the right direction for the indices `cop`, `cp`, and
`literals`?  We have `idx_triples_cspo` on `triples` from the Marmotta
installation and created one that we called `idx_triples_c` [4].  Are there
optional indices that Marmotta doesn't create when it automatically creates
the tables the first time you run it?

* If you do not use versioning, cleanup periodically (nightly) deleted
> triples: DELETE FROM triples where deleted = true;
>

Thanks for clarifying that those rows can be deleted. We've been assuming
that we want to use versioning in the future, and will need to talk among
ourselves about the implications of foregoing versioning. I believe that
the main reason for using the KiWi, and thus PostgreSQL, was to have the
possibility of versioning.



> Hope that helps. Try to implement some of those suggestion in your system
> and tell us how they behave and where you still need more help, and maybe
> code patching.
>

Thanks again.  Does the information referenced above and elsewhere on our
wiki clarify how we have things tuned?  Our biggest problem at the moment
is the slowness of UPDATE statements when we run concurrent LDP requests,
detailed in [3]. We see locks piling up, but need to ascertain whether
that's a cause or a symptom of something else. The thing that's most
notable is how a single batch of LDP requests (in the course of one of our
mapping or enrichment runs) works very efficiently, but two concurrent
batches of LDP requests have severely degraded performance.  I just want to
make sure first that nothing in our database tuning jumps out as a culprit
before suggesting changes to the code, yours our ours.

References:

[1]
https://digitalpubliclibraryofamerica.atlassian.net/wiki/pages/viewpage.action?pageId=26804351
[2] https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
[3]
https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/Addressing+slow+updates+and+inserts
[4]
https://digitalpubliclibraryofamerica.atlassian.net/wiki/display/TECH/Index+performance+with+high+context+counts

Re: Scaling Marmotta's LDP interface

Reply via email to