Re: [PERFORM] atrocious update performance

2004-03-16 Thread Tom Lane
Shridhar Daithankar <[EMAIL PROTECTED]> writes:
> Rosser Schwarz wrote:
>> shared_buffers = 4096
>> sort_mem = 32768
>> vacuum_mem = 32768
>> wal_buffers = 16384
>> checkpoint_segments = 64
>> checkpoint_timeout = 1800
>> checkpoint_warning = 30
>> commit_delay = 5
>> effective_cache_size = 131072

> First of all, your shared buffers are low. 4096 is 64MB with 16K block
> size. I would say at least push them to 150-200MB.

Check.  Much more than that isn't necessarily better though.
shared_buffers = 1 is frequently mentioned as a "sweet spot".

> Secondly your sort mem is too high. Note that it is per sort per query. You 
> could build a massive swap storm with such a setting.

Agreed, but I doubt that has anything to do with the immediate
problem, since he's not testing parallel queries.

> Similarly pull down vacuum and WAL buffers to around 512-1024 each.

The vacuum_mem setting here is 32Mb, which seems okay to me, if not on
the low side.  Again though it's not his immediate problem.

I agree that the wal_buffers setting is outlandishly large; I can't see
any plausible reason for it to be more than a few dozen.  I don't know
whether oversized wal_buffers can directly cause any performance issues,
but it's certainly not a well-tested scenario.

The other setting I was going to comment on is checkpoint_warning;
it seems mighty low in comparison to checkpoint_timeout.  If you are
targeting a checkpoint every half hour, I'd think you'd want the system
to complain about checkpoints spaced more closely than several minutes.

But with the possible exception of wal_buffers, I can't see anything in
these settings that explains the originally complained-of performance
problem.  I'm still wondering about foreign key checks.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] atrocious update performance

2004-03-16 Thread Rosser Schwarz
while you weren't looking, Tom Lane wrote:

> But with the possible exception of wal_buffers, I can't see 
> anything in
> these settings that explains the originally complained-of performance
> problem.  I'm still wondering about foreign key checks.

Many of the configs I posted were fairly wild values, set to gather
data points for further tweaking.  Unfortunately, with this query
there hasn't been time for many passes, and I've too much else on my
plate to try concocting demonstration cases.  The postmaster's been
hupped with more sane values, but I experienced this same issue with
the defaults.

As for foreign keys, three tables refer to account.cust; all of them
refer to account.cust.custid, the pk.  One of those tables has several
hundred thousand rows, many more to come; the others are empty.  Unless
I've woefully misunderstood, the presence or absence of a foreign key
referring to one column should be moot for updates writing another
column, shouldn't it?

To answer your (and others') question, Tom, 7.4.1 on 2.4.20-18.9smp.
Red Hat, I believe.  I was handed the machine, which is also in use
for lightweight production stuff:  intranet webserver, rinky-dink
MySQL doo-dads, &c.  I'm sure that has an impact, usurping the disk
heads and such--maybe even more than I'd expect--but I can't imagine
that'd cause an update to one 4.7M row table, from another 4.7M row
table, both clustered on a join column that maps one-to-one between
them, to take days.  I'm baffled; everything else is perfectly snappy,
given the hardware.  Anything requiring a sequential scan over one of
the big tables is a slog, but that's to be expected and hence all the
indices.

Watching iostat, I've observed a moderately cyclic read-big, write-
big pattern, wavelengths generally out of phase, interspersed with
smaller, almost epicycles--from the machine's other tasks, I'm sure.
top has postmaster's cpu usage rarely breaking 25% over the course
of the query's execution, and spending most of its time much lower;
memory usage hovers somewhere north of 500MB.

In what little time I had to stare at a disturbingly matrix-esque
array of terminals scrolling sundry metrics, I didn't notice a
correlation between cpu usage spikes and peaks in the IO cycle's
waveforms.  For whatever that's worth.

The other tables involved are:

# \d account.acct
  Table "account.acct"
   Column   |Type |Modifiers  
+-+-

 acctid | bigint  | not null default
  |
nextval('account.acctid_seq'::text)
 custid | bigint  |
 acctstatid | integer | not null
 acctno | character varying(50)   |
 bal| money   |
 begdt  | timestamp without time zone | not null
 enddt  | timestamp without time zone |
 debtid | character varying(50)   |
Indexes:
"acct_pkey" primary key, btree (acctid)
"ix_acctno" btree (acctno) WHERE (acctno IS NOT NULL)
Foreign-key constraints:
"$1" FOREIGN KEY (custid) REFERENCES account.cust(custid)
 ON UPDATE CASCADE ON DELETE RESTRICT
"$2" FOREIGN KEY (acctstatid) REFERENCES account.acctstat(acctstatid)
 ON UPDATE CASCADE ON DELETE RESTRICT

# \d account.note
  Table "account.note"
  Column   |Type |   Modifiers   
---+-+--
---
 noteid| bigint  | not null default
 |
nextval('account.noteid_seq'::text)
 custid| bigint  | not null
 note  | text| not null
 createddt | timestamp without time zone | not null default now()
Indexes:
"note_pkey" primary key, btree (noteid)
Foreign-key constraints:
"$1" FOREIGN KEY (custid) REFERENCES account.cust(custid)
 ON UPDATE CASCADE ON DELETE RESTRICT

# \d account.origacct
   Table "account.origacct"
   Column|Type | Modifiers
-+-+---
 custid  | bigint  |
 lender  | character varying(50)   |
 chgoffdt| timestamp without time zone |
 opendt  | timestamp without time zone |
 offbureaudt | timestamp without time zone |
 princbal| money   |
 intbal  | money   |
 totbal  | money   |
 lastpayamt  | money   |
 lastpaydt   | timestamp without time zone |
 debttype| integer |
 debtid  | character varying(10)   |
 acctno  | character varying(50)   |
Foreign-key constraints:
"$1" FOREIGN KEY (custid) REFERENCES account.cust(custid)
 ON UPDATE CASCADE ON DELETE RESTRIC

[PERFORM] Fwd: Configuring disk cache size on postgress

2004-03-16 Thread Darcy Buskermolen
This apeared on the Freebsd-perfomace list and though people here could help 
as well.



--  Forwarded Message  --

Subject: Configuring disk cache size on postgress
Date: March 16, 2004 10:44 am
From: Dror Matalon <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]

Hi Folks,

When configuring postgres, one of the variables to configure is
effective_cache_size:
Sets the optimizer's assumption about the effective size of the disk
cache (that is, the portion of the kernel's disk cache that will be
used for PostgreSQL data files). This is measured in disk pages, which
are normally 8 kB each.
(http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html)

The conventional wisdom on the postgres list has been that for freebsd
you calculate this by doing `sysctl -n vfs.hibufspace` / 8192).

Now I'm running 4.9 with 2 Gig of ram and sysctl -n vfs.hibufspace
indicates usage of 200MB.

Questions:
1. How much RAM is freebsd using for *disk* caching? Is it part of the
general VM or is it limited to the above 200MB? I read Matt Dillon's
http://www.daemonnews.org/21/freebsd_vm.html, but most of the
discussion there seems to be focused on caching programs and program
data.

2. Can I tell, and if so how,  how much memory the OS is using for disk
caching?

3. What are the bufspace variables for?

This subject has been touched on before in
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/performance/2003-09/0045.html
which point to a patch to increase the bufspace.

Regards,

Dror


--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to
 "[EMAIL PROTECTED]"

---

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] atrocious update performance

2004-03-16 Thread Tom Lane
"Rosser Schwarz" <[EMAIL PROTECTED]> writes:
> But if I'm not touching the column referenced from account.acct, why
> would it be looking there at all?  I've got an explain analyze of the
> update running now, but until it finishes, I can't say for certain
> what it's doing.  explain, alone, says:

EXPLAIN won't tell you anything about triggers that might get fired
during the UPDATE, so it's not much help for investigating possible
FK performance problems.  EXPLAIN ANALYZE will give you some indirect
evidence: the difference between the total query time and the total time
reported for the topmost plan node represents the time spent running
triggers and physically updating the tuples.  I suspect we are going
to see a big difference.

> which shows it not hitting account.acct at all.  (And why did it take
> the planner 20-some seconds to come up with that query plan?)

It took 20 seconds to EXPLAIN?  That's pretty darn odd in itself.  I'm
starting to think there must be something quite whacked-out about your
installation, but I haven't got any real good ideas about what.

(I'm assuming of course that there weren't a ton of other jobs eating
CPU while you tried to do the EXPLAIN.)

[ thinks for awhile... ]  The only theory that comes to mind for making
the planner so slow is oodles of dead tuples in pg_statistic.  Could I
trouble you to run
vacuum full verbose pg_statistic;
and send along the output?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] atrocious update performance

2004-03-16 Thread Rosser Schwarz
while you weren't looking, Tom Lane wrote:

> EXPLAIN won't tell you anything about triggers that might get fired
> during the UPDATE, so it's not much help for investigating possible
> FK performance problems.  EXPLAIN ANALYZE will give you some indirect
> evidence: the difference between the total query time and the total time
> reported for the topmost plan node represents the time spent running
> triggers and physically updating the tuples.  I suspect we are going
> to see a big difference.

It's still running.

> It took 20 seconds to EXPLAIN?  That's pretty darn odd in itself.

It struck me, too.

> I'm starting to think there must be something quite whacked-out about
> your installation, but I haven't got any real good ideas about what.

Built from source.  configure arguments:

./configure --prefix=/var/postgresql --bindir=/usr/bin
--enable-thread-safety --with-perl --with-python --with-openssl
--with-krb5=/usr/kerberos

I can answer more specific questions; otherwise, I'm not sure what to
look for, either.  If we could take the machine out of production (oh,
hell; I think I just volunteered myself for weekend work) long enough
to reinstall everything to get a fair comparison...

So far as I know, though, it's a more or less stock Red Hat.  2.4.20-
something.

> (I'm assuming of course that there weren't a ton of other jobs eating
> CPU while you tried to do the EXPLAIN.)

CPU's spiked sopradically, which throttled everything else, but it never
stays high.  top shows the current explain analyze running between 50-
ish% and negligible.  iostat -k 3 shows an average of 3K/sec written, for
a hundred-odd tps.

I can't get any finer-grained than that, unfortunately; the machine was
handed to me with a single, contiguous filesystem, in production use.

> [ thinks for awhile... ]  The only theory that comes to mind
> for making
> the planner so slow is oodles of dead tuples in pg_statistic.  Could I
> trouble you to run
>   vacuum full verbose pg_statistic;
> and send along the output?

INFO:  vacuuming "pg_catalog.pg_statistic"
INFO:  "pg_statistic": found 215 removable, 349 nonremovable row versions
in 7 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 72 to 8132 bytes long.
There were 3 unused item pointers.
Total free space (including removable row versions) is 91572 bytes.
0 pages are or will become empty, including 0 at the end of the table.
7 pages containing 91572 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.71 sec.
INFO:  index "pg_statistic_relid_att_index" now contains 349 row versions
in 2 pages
DETAIL:  215 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_statistic": moved 120 row versions, truncated 7 to 5 pages
DETAIL:  CPU 0.03s/0.01u sec elapsed 0.17 sec.
INFO:  index "pg_statistic_relid_att_index" now contains 349 row versions
in 2 pages
DETAIL:  120 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16408"
INFO:  "pg_toast_16408": found 12 removable, 12 nonremovable row versions
in 5 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 660 to 8178 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 91576 bytes.
2 pages are or will become empty, including 0 at the end of the table.
5 pages containing 91576 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.27 sec.
INFO:  index "pg_toast_16408_index" now contains 12 row versions in 2 pages
DETAIL:  12 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.05 sec.
INFO:  "pg_toast_16408": moved 10 row versions, truncated 5 to 3 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "pg_toast_16408_index" now contains 12 row versions in 2 pages
DETAIL:  10 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

Having never more than glanced at the output of "vacuum verbose", I
can't say whether that makes the cut for oodles.  My suspicion is no.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] atrocious update performance

2004-03-16 Thread Tom Lane
"Rosser Schwarz" <[EMAIL PROTECTED]> writes:
> Having never more than glanced at the output of "vacuum verbose", I
> can't say whether that makes the cut for oodles.  My suspicion is no.

Nope, it sure doesn't.  We occasionally see people who don't know they
need to vacuum regularly and have accumulated hundreds or thousands of
dead tuples for every live one :-(.  That's clearly not the issue here.

I'm fresh out of ideas, and the fact that this is a live server kinda
limits what we can do experimentally ... but clearly, *something* is
very wrong.

Well, when you don't know what to look for, you still have to look.
One possibly useful idea is to trace the kernel calls of the backend
process while it does that ridiculously long EXPLAIN --- think you could
try that?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] atrocious update performance

2004-03-16 Thread Rosser Schwarz
while you weren't looking, Tom Lane wrote:

[trace]

`strace -p 21882` run behind the below query and plan ... below that.

# explain update account.cust set prodid = tempprod.prodid, subprodid =
tempprod.subprodid where origid = tempprod.debtid;
  QUERY PLAN
-
 Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
   Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
   ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
   rows=4731410 width=236)
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
   rows=4731410 width=26)
(4 rows)

--

recv(9, "Q\0\0\0}explain update account.cust"..., 8192, 0) = 126
gettimeofday({1079482151, 106228}, NULL) = 0
brk(0)  = 0x82d9000
brk(0x82db000)  = 0x82db000
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 8
_llseek(8, 212402176, [212402176], SEEK_SET) = 0
write(8, "\342\1\0\0\0\314\374\6\24\0\0\0\214\7pG\360\177\1\200\320"...,
32768) = 32768
close(8)= 0
open("/var/lib/pgsql/data/base/495616/16635", O_RDWR|O_LARGEFILE) = 8
read(8, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 10
_llseek(10, 60817408, [60817408], SEEK_SET) = 0
write(10, "\342\1\0\0`\334\5\7\24\0\0\0t\0010x\360\177\1\200\330\377"...,
32768) = 32768
close(10)   = 0
read(8, "\334\1\0\0h\217\270n\24\0\0\0H\0H|[EMAIL PROTECTED]"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 10
_llseek(10, 130777088, [130777088], SEEK_SET) = 0
write(10, "\342\1\0\0<\341\7\7\24\0\0\0004\t0I\360\177\1\200\330\377"...,
32768) = 32768
close(10)   = 0
open("/var/lib/pgsql/data/base/495616/16595", O_RDWR|O_LARGEFILE) = 10
read(10, "[EMAIL PROTECTED]"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 11
_llseek(11, 145915904, [145915904], SEEK_SET) = 0
write(11, "\342\1\0\0\300\350\n\7\24\0\0\0\224\6\310Z\360\177\1\200"...,
32768) = 32768
close(11)   = 0
open("/var/lib/pgsql/data/base/495616/16614", O_RDWR|O_LARGEFILE) = 11
read(11, "\0\0\0\0\24\231P\306\16\0\0\0\24\0\360\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 148570112, [148570112], SEEK_SET) = 0
write(12, "\342\1\0\0\274\365\22\7\24\0\0\0X\3\234o\360\177\1\200"...,
32768)
= 32768
close(12)   = 0
_llseek(11, 98304, [98304], SEEK_SET)   = 0
read(11, "\0\0\0\0\24\231P\306\16\0\0\0\34\0\234\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 251789312, [251789312], SEEK_SET) = 0
write(12, "\342\1\0\0l\366\23\7\24\0\0\0\364\10\260J\360\177\1\200"...,
32768)
= 32768
close(12)   = 0
_llseek(11, 32768, [32768], SEEK_SET)   = 0
read(11, "\340\1\0\0\324\231\273\241\24\0\0\0\234\5\330\26\360\177"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 117309440, [117309440], SEEK_SET) = 0
write(12, "\342\1\0\0d\36)\7\24\0\0\\tHI\360\177\1\200\330\377"...,
32768)
= 32768
close(12)   = 0
open("/var/lib/pgsql/data/base/495616/1259", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 32768, [32768], SEEK_SET)   = 0
read(12, "\334\1\0\0\324v-p\24\0\0\\3\304\3\0\200\1\200<\377"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 13
_llseek(13, 247824384, [247824384], SEEK_SET) = 0
write(13, "\342\1\0\0h *\7\24\0\0\0\204\4dm\360\177\1\200\340\377"...,
32768)
= 32768
close(13)   = 0
open("/var/lib/pgsql/data/base/495616/16613", O_RDWR|O_LARGEFILE) = 13
read(13, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 14
_llseek(14, 204472320, [204472320], SEEK_SET) = 0
write(14, "\342\1\0\0\314\272:\7\24\0\0\0\324\t\354K\360\177\1\200"...,
32768)
= 32768
close(14)   = 0
read(13, "\340\1\0\0X\231\273\241\24\0\0\0\370\6Dk\360\177\1\200"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 14
_llseek(14, 152010752, [152010752], SEEK_SET) = 0
write(14, "\342\1\0\0p\277<\7\24\0\0\0\364\n\220I\360\177\1\200\334"...,
32768) = 32768
close(14)   = 0
open("/var/lib/pgsql/data/base/495616/16610", O_RDWR|O_LARGEFILE) = 14
read(14, "\0\0\0\0\10\317\27\t\16\0\0\0\24\0\360\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 15
_llseek(15, 86441984, [86441984], SEEK_SET) = 0
write(15, "\342\1\0\0\330B?\7\24\0\0\0\370\6 N\360\177\1\2

Re: [PERFORM] atrocious update performance

2004-03-16 Thread Tom Lane
"Rosser Schwarz" <[EMAIL PROTECTED]> writes:
> `strace -p 21882` run behind the below query and plan ... below that.

Hmm ... that took 20 seconds eh?

It is a fairly interesting trace.  It shows that the backend needed to
read 63 system catalog pages (that weren't already in shared memory),
which is not too unreasonable I think ... though I wonder if more of
them shouldn't have been in memory already.  The odd thing is that for
*every single read* it was necessary to first dump out a dirty page
in order to make a buffer free.  That says you are running with the
entire contents of shared buffer space dirty at all times.  That's
probably not the regime you want to be operating in.  I think we already
suggested increasing shared_buffers.  You might also want to think about
not using such a large checkpoint interval.  (The background-writing
logic already committed for 7.5 should help this problem, but it's not
there in 7.4.)

Another interesting fact is that the bulk of the writes were "blind
writes", involving an open()/write()/close() sequence instead of keeping
the open file descriptor around for re-use.  This is not too surprising
in a freshly started backend, I guess; it's unlikely to have had reason
to create a relation descriptor for the relations it may have to dump
pages for.  In some Unixen, particularly Solaris, open() is fairly
expensive and so blind writes are bad news.  I didn't think it was a big
problem in Linux though.  (This is another area we've improved for 7.5:
there are no more blind writes.  But that won't help you today.)

What's not immediately evident is whether the excess I/O accounted for
all of the slowdown.  Could you retry the strace with -r and -T options
so we can see how much time is being spent inside and outside the
syscalls?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] atrocious update performance

2004-03-16 Thread Aaron Werman
Quick observations:

1. We have an explanation for what's going on, based on the message being
exactly 666 lines long :-)
2. I'm clueless on the output, but perhaps Tom can see something. A quick
glance shows that the strace seemed to run 27 seconds, during which it did:
  count| call
  ---|-
  84 | _llseek
  40 | brk
  54 | close
  88 | open
  63 | read
in other words, nothing much (though it did *a lot* of opens and closes of
db files to do nothing ).

Can you do another strace for a few minutes against the actual update query
adding the -c/-t options and control-c out?

- Original Message - 
From: "Rosser Schwarz" <[EMAIL PROTECTED]>
To: "'Tom Lane'" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, March 16, 2004 7:20 PM
Subject: Re: [PERFORM] atrocious update performance


while you weren't looking, Tom Lane wrote:

[trace]

`strace -p 21882` run behind the below query and plan ... below that.

# explain update account.cust set prodid = tempprod.prodid, subprodid =
tempprod.subprodid where origid = tempprod.debtid;

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Joe Conway
Tom Lane wrote:
Joe Conway <[EMAIL PROTECTED]> writes:

Any idea where I can get my hands on the latest version. I found the 
original post from Tom, but I thought there was a later version with 
both number of pages and time to sleep as knobs.
That was as far as I got.  I think Jan posted a more complex version
that would still be reasonable to apply to 7.4.
I have tested Tom's original patch now. The good news -- it works great 
in terms of reducing the load imposed by vacuum -- almost to the level 
of being unnoticeable. The bad news -- in a simulation test which loads 
an hour's worth of data, even with delay set to 1 ms, vacuum of the 
large table exceeds two hours (vs 12-14 minutes with delay = 0). Since 
that hourly load is expected 7 x 24, this obviously isn't going to work.

The problem with Jan's more complex version of the patch (at least the 
one I found - perhaps not the right one) is it includes a bunch of other 
experimental stuff that I'd not want to mess with at the moment. Would 
changing the input units (for the original patch) from milli-secs to 
micro-secs be a bad idea? If so, I guess I'll get to extracting what I 
need from Jan's patch.

Thanks,

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Matthew T. O'Connor
On Tue, 2004-03-16 at 23:49, Joe Conway wrote:
I have tested Tom's original patch now. The good news -- it works great 
in terms of reducing the load imposed by vacuum -- almost to the level 
of being unnoticeable. The bad news -- in a simulation test which loads 
an hour's worth of data, even with delay set to 1 ms, vacuum of the 
large table exceeds two hours (vs 12-14 minutes with delay = 0). Since 
that hourly load is expected 7 x 24, this obviously isn't going to work.
If memory serves, the problem is that you actually sleep 10ms even when
you set it to 1.  One of the thing changed in Jan's later patch was the
ability to specify how many pages to work on before sleeping, rather
than how long to sleep inbetween every 1 page.  You might be able to do
a quick hack and have it do 10 pages or so before sleeping.
Matthew

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> I have tested Tom's original patch now. The good news -- it works great 
> in terms of reducing the load imposed by vacuum -- almost to the level 
> of being unnoticeable. The bad news -- in a simulation test which loads 
> an hour's worth of data, even with delay set to 1 ms, vacuum of the 
> large table exceeds two hours (vs 12-14 minutes with delay = 0). Since 
> that hourly load is expected 7 x 24, this obviously isn't going to work.

Turns the dial down a bit too far then ...

> The problem with Jan's more complex version of the patch (at least the 
> one I found - perhaps not the right one) is it includes a bunch of other 
> experimental stuff that I'd not want to mess with at the moment. Would 
> changing the input units (for the original patch) from milli-secs to 
> micro-secs be a bad idea?

Unlikely to be helpful; on most kernels the minimum sleep delay is 1 or
10 msec, so asking for a few microsec is the same as asking for some
millisec.  I think what you need is a knob of the form "sleep N msec
after each M pages of I/O".  I'm almost certain that Jan posted such a
patch somewhere between my original and the version you refer to above.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Joe Conway
Matthew T. O'Connor wrote:
If memory serves, the problem is that you actually sleep 10ms even when
you set it to 1.  One of the thing changed in Jan's later patch was the
ability to specify how many pages to work on before sleeping, rather
than how long to sleep inbetween every 1 page.  You might be able to do
a quick hack and have it do 10 pages or so before sleeping.
I thought I remembered something about that.

It turned out to be less difficult than I first thought to extract the 
vacuum delay stuff from Jan's performance patch. I haven't yet tried it 
out, but it's attached in case you are interested. I'll report back once 
I have some results.

Joe
Index: src/backend/access/nbtree/nbtree.c
===
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/access/nbtree/nbtree.c,v
retrieving revision 1.106
diff -c -b -r1.106 nbtree.c
*** src/backend/access/nbtree/nbtree.c  2003/09/29 23:40:26 1.106
--- src/backend/access/nbtree/nbtree.c  2003/11/03 17:56:54
***
*** 18,23 
--- 18,25 
   */
  #include "postgres.h"
  
+ #include 
+ 
  #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/nbtree.h"
***
*** 27,32 
--- 29,39 
  #include "storage/smgr.h"
  
  
+ extern intvacuum_page_delay;
+ extern intvacuum_page_groupsize;
+ extern intvacuum_page_groupcount;
+ 
+ 
  /* Working state for btbuild and its callback */
  typedef struct
  {
***
*** 610,615 
--- 617,631 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay > 0)
+   {
+   if (++vacuum_page_groupcount >= vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
ndeletable = 0;
page = BufferGetPage(buf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
***
*** 736,741 
--- 752,768 
Buffer  buf;
Pagepage;
BTPageOpaque opaque;
+ 
+   CHECK_FOR_INTERRUPTS();
+ 
+   if (vacuum_page_delay > 0)
+   {
+   if (++vacuum_page_groupcount >= vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
  
buf = _bt_getbuf(rel, blkno, BT_READ);
page = BufferGetPage(buf);
Index: src/backend/commands/vacuumlazy.c
===
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.32
diff -c -b -r1.32 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c   2003/09/25 06:57:59 1.32
--- src/backend/commands/vacuumlazy.c   2003/11/03 17:57:27
***
*** 37,42 
--- 37,44 
   */
  #include "postgres.h"
  
+ #include 
+ 
  #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/xlog.h"
***
*** 88,93 
--- 90,99 
  static TransactionId OldestXmin;
  static TransactionId FreezeLimit;
  
+ int   vacuum_page_delay = 0;  /* milliseconds per page group */
+ int   vacuum_page_groupsize = 10; /* group size */
+ int   vacuum_page_groupcount = 0; /* current group size count */
+ 
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
***
*** 228,233 
--- 234,248 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay > 0)
+   {
+   if (++vacuum_page_groupcount >= vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
/*
 * If we are close to overrunning the available space for
 * dead-tuple TIDs, pause and do a cycle of vacuuming before we
***
*** 469,474 
--- 484,498 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay > 0)
+   {
+   if (++vacuum_page_groupcount >= vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
buf = ReadBuffer(onerel, tblk);

Re: [PERFORM] severe performance issue with planner (fwd)

2004-03-16 Thread Kris Jurka

I sent this message to the list and although it shows up in the archives,
I did not receive a copy of it through the list, so I'm resending as I
suspect others did not see it either.

-- Forwarded message --
Date: Sat, 13 Mar 2004 22:48:01 -0500 (EST)
From: Kris Jurka <[EMAIL PROTECTED]>
To: Tom Lane <[EMAIL PROTECTED]>
Cc: Eric Brown <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
Subject: Re: [PERFORM] severe performance issue with planner 

On Thu, 11 Mar 2004, Tom Lane wrote:

> "Eric Brown" <[EMAIL PROTECTED]> writes:
> > [ planning a 9-table query takes too long ]
> 
> See http://www.postgresql.org/docs/7.4/static/explicit-joins.html
> for some useful tips.
> 

Is this the best answer we've got?  For me with an empty table this query 
takes 4 seconds to plan, is that the expected planning time?  I know I've 
got nine table queries that don't take that long.

Setting geqo_threshold less than 9, it takes 1 second to plan.  Does this 
indicate that geqo_threshold is set too high, or is it a tradeoff between 
planning time and plan quality?  If the planning time is so high because 
the are a large number of possible join orders, should geqo_threhold be 
based on the number of possible plans somehow instead of the number of 
tables involved?

Kris Jurka


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] A good article about application tuning

2004-03-16 Thread Shridhar Daithankar
http://www.databasejournal.com/features/postgresql/article.php/3323561

 Shridhar

---(end of broadcast)---
TIP 8: explain analyze is your friend