Added to TODO:
* Improve performance of shared invalidation queue for multiple CPUs
http://archives.postgresql.org/pgsql-performance/2008-01/msg00023.php
---
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
On Fri, 25 Jan 2008, Simon Riggs wrote:
1. Try to avoid having all the backends hit the queue at once. Instead
of SIGUSR1'ing everybody at the same time, maybe hit only the process
with the oldest message pointer, and have him hit the next oldest after
he's done reading the queue.
My feeling w
On Mon, 2008-01-07 at 19:54 -0500, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Perhaps it would make sense to try to take the "fast path" in
> > SIDelExpiredDataEntries with only a shared lock rather than exclusive.
>
> I think the real problem here is that sinval catchup proc
> Okay, for a table of just a few entries I agree that DELETE is
> probably better. But don't forget you're going to need to have those
> tables vacuumed fairly regularly now, else they'll start to bloat.
I think we'll go with DELETE also for another reason:
Just after we figured out the cause
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
>>> Huh. One transaction truncating a dozen tables? That would match the
>>> sinval trace all right ...
> It should be 4 tables - the shown log looks like there were more truncates?
Actually, counting up the entries, there are close to 2 dozen relation
Adrian Moisey <[EMAIL PROTECTED]> writes:
>> we've found it: TRUNCATE
> I haven't been following this thread. Can someone please explain to me
> why TRUNCATE causes these spikes?
It's not so much the TRUNCATE as the overhead of broadcasting the
resultant catalog changes to the many hundreds of
> Huh. One transaction truncating a dozen tables? That would match the
> sinval trace all right ...
It should be 4 tables - the shown log looks like there were more truncates?
> You might be throwing the baby out with the bathwater,
> performance-wise.
Yes, performance was the initial reason
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
> we've found it: TRUNCATE
Huh. One transaction truncating a dozen tables? That would match the
sinval trace all right ...
> One more question: is it ok to do mass regexp update of pg_proc.prosrc
> changing TRUNCATEs to DELETEs?
You might be throwing
Hi
> I can think of three things that might be producing this:
we've found it: TRUNCATE
I haven't been following this thread. Can someone please explain to me
why TRUNCATE causes these spikes?
--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.care
Hi Tom,
> I can think of three things that might be producing this:
we've found it: TRUNCATE
We'll try to eliminate use of TRUNCATE and the periodical spikes should
go off. There will still be possibility of spikes because of database
creation etc - we'll try to handle this by issuing trivial
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
> What does it mean?
Look at src/include/storage/sinval.h and src/include/utils/syscache.h.
What you seem to have here is a bunch of tuple updates in pg_class
(invalidating caches 29 and 30, which in 8.2 correspond to RELNAMENSP
and RELOID), followed by a
Hi Tom,
> Strange. The best idea that comes to mind is to add some debugging
> code to SendSharedInvalidMessage to log the content of each message
> that's sent out. That would at least tell us *what* is going into
> the queue, even if not directly *why*.
we've patched postgresql and run one o
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
> We'we tried hard to identify what's the cause of filling sinval-queue.
> We went through query logs as well as function bodies stored in the
> database. We were not able to find any DDL, temp table creations etc.
Strange. The best idea that comes to
Hi Tom,
> I doubt we'd risk destabilizing 8.3 at this point, for a problem that
> affects so few people; let alone back-patching into 8.2.
understand.
> OK, that confirms the theory that it's sinval-queue contention.
We'we tried hard to identify what's the cause of filling sinval-queue.
We we
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
> Yes, I can confirm that it's triggered by SIGUSR1 signals.
OK, that confirms the theory that it's sinval-queue contention.
> If I understand it correctly we have following choices now:
> 1) Use only 2 cores (out of 8 cores)
> 2) Lower the number of i
> You could check this theory
> out by strace'ing some of the idle backends and seeing if their
> activity spikes are triggered by receipt of SIGUSR1 signals.
Yes, I can confirm that it's triggered by SIGUSR1 signals.
If I understand it correctly we have following choices now:
1) Use only 2 cor
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Perhaps it would make sense to try to take the "fast path" in
> SIDelExpiredDataEntries with only a shared lock rather than exclusive.
I think the real problem here is that sinval catchup processing is well
designed to create contention :-(. Once we've
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
>>> Does your app create and destroy a tremendous number of temp tables,
>>> or anything else in the way of frequent DDL commands?
> Hmm. I can't think of anything like this. Maybe there are few backends
> which create temp tables but not tremendous numb
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
> We've tried several times to get stacktrace from some of the running
> backends during spikes, we got always this:
> 0x2b005d00a9a9 in semop () from /lib/libc.so.6
> #0 0x2b005d00a9a9 in semop () from /lib/libc.so.6
> #1 0x0054fe53 in
Jakub Ouhrabka wrote:
> We've tried several times to get stacktrace from some of the running
> backends during spikes, we got always this:
>
> 0x2b005d00a9a9 in semop () from /lib/libc.so.6
> #0 0x2b005d00a9a9 in semop () from /lib/libc.so.6
> #1 0x0054fe53 in PGSemaphoreLock (s
Hi Tom & all,
>> It sounds a bit like momentary contention for a spinlock,
>> but exactly what isn't clear.
> ok, we're going to try oprofile, will let you know...
yes, it seems like contention for spinlock if I'm intepreting oprofile
correctly, around 60% of time during spikes is in s_lock. [
James Mansion wrote:
Jakub Ouhrabka wrote:
How can we diagnose what is happening during the peaks?
Can you try forcing a core from a bunch of the busy processes? (Hmm -
does Linux have an equivalent to the useful Solaris pstacks?)
There's a 'pstack' for Linux, shipped at least in Red Hat distr
Jakub Ouhrabka wrote:
How can we diagnose what is happening during the peaks?
Can you try forcing a core from a bunch of the busy processes? (Hmm -
does Linux have an equivalent to the useful Solaris pstacks?)
James
---(end of broadcast)---
TIP
Alvaro,
>>> - do an UNLISTEN if possible
>> Yes, we're issuing unlistens when appropriate.
>
> You are vacuuming pg_listener periodically, yes? Not that this seems
> to have any relationship to your problem, but ...
yes, autovacuum should take care of this. But looking forward for
multiple-wor
Jakub Ouhrabka wrote:
> > - do an UNLISTEN if possible
>
> Yes, we're issuing unlistens when appropriate.
You are vacuuming pg_listener periodically, yes? Not that this seems to
have any relationship to your problem, but ...
--
Alvaro Herrerahttp://www.CommandPr
Hi Sven,
> I guess all backends do listen to the same notification.
Unfortunatelly no. The backends are listening to different notifications
in different databases. Usually there are only few listens per database
with only one exception - there are many (hundreds) listens in one
database but
Hi Tom,
> Interesting. Maybe you could use oprofile to try to see what's
> happening? It sounds a bit like momentary contention for a spinlock,
> but exactly what isn't clear.
ok, we're going to try oprofile, will let you know...
> Perhaps. Have you tried logging executions of NOTIFY to see
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
> we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We
> came accross a strange issue: when running with all 8 cores enabled
> approximatly once a minute (period differs) the system is very busy for
> a few seconds (~5-10s) and we don'
Hi Jakub,
I do have a similar server (from DELL), which performance well with our
PostgreSQL application. I guess the peak in context switches is the only
think you can see.
Anyhow, I think it is you're LISTEN/NOTIFY approach which cause that
behaviour. I guess all backends do listen to the same
Hi all,
we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We
came accross a strange issue: when running with all 8 cores enabled
approximatly once a minute (period differs) the system is very busy for
a few seconds (~5-10s) and we don't know why - this issue don't show up
wh
30 matches
Mail list logo