Re: Scaling Marmotta's LDP interface

Sergio Fernández Thu, 24 Sep 2015 08:49:07 -0700

Hi Mark,

On Thu, Sep 24, 2015 at 12:53 AM, Mark Breedlove <m...@dp.la> wrote:
>
> * max_connections: default is 100, 1000 is at least recommended, but goes
>> together with the available memory (see next)
>>
>
> Our Marmotta tends to use just 10 connections, and we have max_connections
> set to 150 for the server, which contains a few other databases in addition
> to Marmotta's that have very light usage. The only time we've hit the
> max_connections limit was during a period when locks were not being
> released. During a normal mapping or enrichment activity, even with the
> problems I've mentioned, we seldom go above 10 connections for the Marmotta
> database, so I'm hesitant to raise the potential memory usage of the
> system, which would result in paging. I'm focused at the moment on stopping
> the problem of the locks being held for so long.
>


I do agree with that. We should find the actual source of those locks.


* shared_buffers: default is just 32MB, as much as you give the more
>> postres would not need to touch disk for any operation (we have
>> installations with 16GB behaving really well)
>>
>
> We have this set to 40% of our real memory -- 3072MB on a 7.5GB server.
> The 40% ratio is based on a statement in the postgresql.org wiki[2],
> which states "given the way PostgreSQL also relies on the operating system
> cache, it's unlikely you'll find using more than 40% of RAM to work better
> than a smaller amount."  Would you agree with the basic ratio, or would you
> suggest raising it?
>
> The table for triples alone is 32GB, so I could see adding more memory and
> increasing shared_buffers, but I'm not sure if I'd ever be able to add
> enough to cover the expanding size of the table.
>

>From my experience Postgres takes as much as it can, but never risking the
system. We normally also use 40% as orientative figure. But with such small
server I'd try to give it more memory (6GB?) to see how it behaves.

Because that, another key aspect that dramatically improves the performance
on updates is I/O. Are you using SSD? Actual SSD, I mean, virtual SSDs are
a completely different story.



> * work_mem: default is 4MB, you can dramatically increase it (128 or 256
>> MB) to improve each each transaction handling
>>
> * maintenance_work_mem: is less critical, but useful on maintenance tasks
>> you periodically should run (see below)
>>
>
> work_mem is 38MB at the moment.  I'll try increasing it and will let you
> know what I encounter.
>

That should have an impact on handling concurrent transaction. Let's see...


* If you have resources, create all the indexes (cspo, cop, cp, literals,
>> etc) you may need to improve performance
>>
>
> Could you point me in the right direction for the indices `cop`, `cp`, and
> `literals`?  We have `idx_triples_cspo` on `triples` from the Marmotta
> installation and created one that we called `idx_triples_c` [4].  Are there
> optional indices that Marmotta doesn't create when it automatically creates
> the tables the first time you run it?
>

Well, internally each LDP-R uses its own context. Therefore all indexes
that could improve the lookup would help. Try to create also at least cop.
In the end you can tune your indexes based on your needs and available
resources. Marmotta does not create all possible indexes by default, but
you can do it if you consider it.


* If you do not use versioning, cleanup periodically (nightly) deleted
>> triples: DELETE FROM triples where deleted = true;
>>
>
> Thanks for clarifying that those rows can be deleted. We've been assuming
> that we want to use versioning in the future, and will need to talk among
> ourselves about the implications of foregoing versioning. I believe that
> the main reason for using the KiWi, and thus PostgreSQL, was to have the
> possibility of versioning.
>

Then discuss it. If you do not need it, cleaning delete triples also helps.


Thanks again.  Does the information referenced above and elsewhere on our
> wiki clarify how we have things tuned?  Our biggest problem at the moment
> is the slowness of UPDATE statements when we run concurrent LDP requests,
> detailed in [3]. We see locks piling up, but need to ascertain whether
> that's a cause or a symptom of something else. The thing that's most
> notable is how a single batch of LDP requests (in the course of one of our
> mapping or enrichment runs) works very efficiently, but two concurrent
> batches of LDP requests have severely degraded performance.  I just want to
> make sure first that nothing in our database tuning jumps out as a culprit
> before suggesting changes to the code, yours our ours.
>

Thanks, having more hints about your setup help quite a lot. I'd say that,
based on my experience with virtual disks, the I/O could be the source of
the issues. So either you move to a more performance disk or we find
strategies to improve the in-memory work.

Sorry for not being so helpful. Performance tuning is very tricky.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co

Re: Scaling Marmotta's LDP interface

Reply via email to