Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-03-21 Thread Bruce Momjian
Added to TODO: * Improve performance of shared invalidation queue for multiple CPUs http://archives.postgresql.org/pgsql-performance/2008-01/msg00023.php --- Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes:

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-28 Thread Matthew
On Fri, 25 Jan 2008, Simon Riggs wrote: 1. Try to avoid having all the backends hit the queue at once. Instead of SIGUSR1'ing everybody at the same time, maybe hit only the process with the oldest message pointer, and have him hit the next oldest after he's done reading the queue. My feeling w

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-25 Thread Simon Riggs
On Mon, 2008-01-07 at 19:54 -0500, Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Perhaps it would make sense to try to take the "fast path" in > > SIDelExpiredDataEntries with only a shared lock rather than exclusive. > > I think the real problem here is that sinval catchup proc

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-15 Thread Jakub Ouhrabka
> Okay, for a table of just a few entries I agree that DELETE is > probably better. But don't forget you're going to need to have those > tables vacuumed fairly regularly now, else they'll start to bloat. I think we'll go with DELETE also for another reason: Just after we figured out the cause

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-15 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes: >>> Huh. One transaction truncating a dozen tables? That would match the >>> sinval trace all right ... > It should be 4 tables - the shown log looks like there were more truncates? Actually, counting up the entries, there are close to 2 dozen relation

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-15 Thread Tom Lane
Adrian Moisey <[EMAIL PROTECTED]> writes: >> we've found it: TRUNCATE > I haven't been following this thread. Can someone please explain to me > why TRUNCATE causes these spikes? It's not so much the TRUNCATE as the overhead of broadcasting the resultant catalog changes to the many hundreds of

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-15 Thread Jakub Ouhrabka
> Huh. One transaction truncating a dozen tables? That would match the > sinval trace all right ... It should be 4 tables - the shown log looks like there were more truncates? > You might be throwing the baby out with the bathwater, > performance-wise. Yes, performance was the initial reason

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-15 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes: > we've found it: TRUNCATE Huh. One transaction truncating a dozen tables? That would match the sinval trace all right ... > One more question: is it ok to do mass regexp update of pg_proc.prosrc > changing TRUNCATEs to DELETEs? You might be throwing

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-15 Thread Adrian Moisey
Hi > I can think of three things that might be producing this: we've found it: TRUNCATE I haven't been following this thread. Can someone please explain to me why TRUNCATE causes these spikes? -- Adrian Moisey System Administrator | CareerJunction | Your Future Starts Here. Web: www.care

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-15 Thread Jakub Ouhrabka
Hi Tom, > I can think of three things that might be producing this: we've found it: TRUNCATE We'll try to eliminate use of TRUNCATE and the periodical spikes should go off. There will still be possibility of spikes because of database creation etc - we'll try to handle this by issuing trivial

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-14 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes: > What does it mean? Look at src/include/storage/sinval.h and src/include/utils/syscache.h. What you seem to have here is a bunch of tuple updates in pg_class (invalidating caches 29 and 30, which in 8.2 correspond to RELNAMENSP and RELOID), followed by a

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-14 Thread Jakub Ouhrabka
Hi Tom, > Strange. The best idea that comes to mind is to add some debugging > code to SendSharedInvalidMessage to log the content of each message > that's sent out. That would at least tell us *what* is going into > the queue, even if not directly *why*. we've patched postgresql and run one o

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-11 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes: > We'we tried hard to identify what's the cause of filling sinval-queue. > We went through query logs as well as function bodies stored in the > database. We were not able to find any DDL, temp table creations etc. Strange. The best idea that comes to

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-11 Thread Jakub Ouhrabka
Hi Tom, > I doubt we'd risk destabilizing 8.3 at this point, for a problem that > affects so few people; let alone back-patching into 8.2. understand. > OK, that confirms the theory that it's sinval-queue contention. We'we tried hard to identify what's the cause of filling sinval-queue. We we

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-08 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes: > Yes, I can confirm that it's triggered by SIGUSR1 signals. OK, that confirms the theory that it's sinval-queue contention. > If I understand it correctly we have following choices now: > 1) Use only 2 cores (out of 8 cores) > 2) Lower the number of i

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-08 Thread Jakub Ouhrabka
> You could check this theory > out by strace'ing some of the idle backends and seeing if their > activity spikes are triggered by receipt of SIGUSR1 signals. Yes, I can confirm that it's triggered by SIGUSR1 signals. If I understand it correctly we have following choices now: 1) Use only 2 cor

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-07 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Perhaps it would make sense to try to take the "fast path" in > SIDelExpiredDataEntries with only a shared lock rather than exclusive. I think the real problem here is that sinval catchup processing is well designed to create contention :-(. Once we've

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-07 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes: >>> Does your app create and destroy a tremendous number of temp tables, >>> or anything else in the way of frequent DDL commands? > Hmm. I can't think of anything like this. Maybe there are few backends > which create temp tables but not tremendous numb

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-07 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes: > We've tried several times to get stacktrace from some of the running > backends during spikes, we got always this: > 0x2b005d00a9a9 in semop () from /lib/libc.so.6 > #0 0x2b005d00a9a9 in semop () from /lib/libc.so.6 > #1 0x0054fe53 in

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-07 Thread Alvaro Herrera
Jakub Ouhrabka wrote: > We've tried several times to get stacktrace from some of the running > backends during spikes, we got always this: > > 0x2b005d00a9a9 in semop () from /lib/libc.so.6 > #0 0x2b005d00a9a9 in semop () from /lib/libc.so.6 > #1 0x0054fe53 in PGSemaphoreLock (s

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-07 Thread Jakub Ouhrabka
Hi Tom & all, >> It sounds a bit like momentary contention for a spinlock, >> but exactly what isn't clear. > ok, we're going to try oprofile, will let you know... yes, it seems like contention for spinlock if I'm intepreting oprofile correctly, around 60% of time during spikes is in s_lock. [

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-04 Thread David Boreham
James Mansion wrote: Jakub Ouhrabka wrote: How can we diagnose what is happening during the peaks? Can you try forcing a core from a bunch of the busy processes? (Hmm - does Linux have an equivalent to the useful Solaris pstacks?) There's a 'pstack' for Linux, shipped at least in Red Hat distr

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-04 Thread James Mansion
Jakub Ouhrabka wrote: How can we diagnose what is happening during the peaks? Can you try forcing a core from a bunch of the busy processes? (Hmm - does Linux have an equivalent to the useful Solaris pstacks?) James ---(end of broadcast)--- TIP

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Jakub Ouhrabka
Alvaro, >>> - do an UNLISTEN if possible >> Yes, we're issuing unlistens when appropriate. > > You are vacuuming pg_listener periodically, yes? Not that this seems > to have any relationship to your problem, but ... yes, autovacuum should take care of this. But looking forward for multiple-wor

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Alvaro Herrera
Jakub Ouhrabka wrote: > > - do an UNLISTEN if possible > > Yes, we're issuing unlistens when appropriate. You are vacuuming pg_listener periodically, yes? Not that this seems to have any relationship to your problem, but ... -- Alvaro Herrerahttp://www.CommandPr

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Jakub Ouhrabka
Hi Sven, > I guess all backends do listen to the same notification. Unfortunatelly no. The backends are listening to different notifications in different databases. Usually there are only few listens per database with only one exception - there are many (hundreds) listens in one database but

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Jakub Ouhrabka
Hi Tom, > Interesting. Maybe you could use oprofile to try to see what's > happening? It sounds a bit like momentary contention for a spinlock, > but exactly what isn't clear. ok, we're going to try oprofile, will let you know... > Perhaps. Have you tried logging executions of NOTIFY to see

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes: > we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We > came accross a strange issue: when running with all 8 cores enabled > approximatly once a minute (period differs) the system is very busy for > a few seconds (~5-10s) and we don'

Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Sven Geisler
Hi Jakub, I do have a similar server (from DELL), which performance well with our PostgreSQL application. I guess the peak in context switches is the only think you can see. Anyhow, I think it is you're LISTEN/NOTIFY approach which cause that behaviour. I guess all backends do listen to the same

[PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Jakub Ouhrabka
Hi all, we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We came accross a strange issue: when running with all 8 cores enabled approximatly once a minute (period differs) the system is very busy for a few seconds (~5-10s) and we don't know why - this issue don't show up wh