On 05/24/12 12:21, Tom Lane wrote: > Jeff Frost <j...@pgexperts.com> writes: >> A few times today, we've seen postgresql 9.1.3 backends on Ubuntu 11.10 >> x86_64 >> get stuck in 'startup' mode. By that I mean the set_ps_output mode. Postgres >> is installed via Martin Pitt's packages. > I took another look at your original report here, and noticed something > that had escaped me the first time: the backtrace shows that what that > backend is doing is loading critical relcache entries the hard way, that > is, via seqscans on the system catalogs. This happens when the relcache > cache file (pg_internal.init) is missing or stale. A backend that has > to go through that will write a new pg_internal.init file when it's > done; but if many sessions are connecting at the same time, they'll each > have to do it until someone succeeds in creating a new cache file. > > Another piece of evidence is here: > >> #5 0x00007f7dd3d91ca4 in ReadBufferExtended (reln=0x7f7dd3b17540, >> forkNum=MAIN_FORKNUM, blockNum=201362, mode=RBM_NORMAL, strategy=<optimized >> out>) at bufmgr.c:246 > It's possible that gdb is lying to us, but it sure looks like this read > request is for block 201362 of pg_attribute (I know it's pg_attribute > from the location of the heap_getnext call). That's 1.5Gb into the > relation, and who knows if it's anywhere near the end yet. > > So in other words, we've got a boatload of backends all seqscanning a > pretty darn large relation, and doing so multiple times. That's > probably enough to explain why it took a long time to finish > initializing; the locking business is likely a red herring. > > How big is pg_attribute normally in this database? I'm suspicious that > what triggered this is some major bloating of pg_attribute (and maybe > some of the other catalogs too). Once you had that, relcache cache file > invalidations are a fairly routine occurrence --- I think an autovacuum > on one of the system catalogs would trigger that, for example. > > regards, tom lane
Interesting! Fortunately, that data directory is still lying around even though we initdb'd to get a working system. So, the current working system's pg_attribute looks like: select pg_size_pretty(pg_relation_size('pg_catalog.pg_attribute')); pg_size_pretty ---------------- 1671 MB (1 row) select pg_size_pretty(pg_total_relation_size('pg_catalog.pg_attribute')); pg_size_pretty ---------------- 2439 MB and on the old broken system: select pg_size_pretty(pg_relation_size('pg_catalog.pg_attribute')); pg_size_pretty ---------------- 1858 MB (1 row) select pg_size_pretty(pg_total_relation_size('pg_catalog.pg_attribute')); pg_size_pretty ---------------- 1907 MB (1 row) So, interestingly, they're both quite large, but the old broken system is quite a bit larger. Any other data points be helpful? -- Jeff Frost <j...@pgexperts.com> CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs