Re: [PERFORM] x206-x225

2006-03-15 Thread Richard Huxton

Jim C. Nasby wrote:

I think you mean this...

http://www.postgresql.org/docs/8.1/static/runtime-config-wal.html

commit_delay (integer)


No, that's not what I mean at all. On a system doing a large number of
WAL-generating transactions per second, it's certainly possible for
multiple transactions to commit in the period of time it takes for the
platter to rotate back into position to allow for writing of the WAL
data. What I don't know is if those multiple transactions would actually
make it to the platter on that rotation, or if they'd serialize,
resulting in one commit per revolution. I do know that there's no
theoretical reason that they couldn't, it's just a matter of putting
enough intelligence in the drive.

Perhaps this is something that SCSI supports and (S)ATA doesn't, since
SCSI allows multiple transactions to be 'in flight' on the bus at once.


SCSI Command queueing:
http://www.storagereview.com/guide2000/ref/hdd/if/scsi/protCQR.html

SATA "native command queuing":
http://www.tomshardware.com/2004/11/16/can_command_queuing_turbo_charge_sata/


But since you mention commit_delay, this does lead to an interesting
possible use: set it equal to the effective rotational period of the
drive. If you know your transaction load well enough, you could possibly
gain some benefit here. But of course a RAID controller with a BBU would
be a better bet...


I suppose as long as you always have several transactions trying to 
commit, have a separate spindle(s) for the WAL then you could improve 
throughput at the cost of the shortest transaction times. Of course, it 
might be that the increase in lock duration etc. might outweigh any 
benefits. I'd suspect the cost/gain would be highly variable with 
changes in workload, and as you say write-cache+BBU seems more sensible.



--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] VACUUM FULL hangs

2006-03-15 Thread Marcin Mańk
Hello list.

I recently tried to do a slony replica of my database, and doing it falied.
I retried, and then it succeeded (why it failed is another story).

This caused that in the replica there is a lot of dead tuples ( If i
understand correctly, a failure in creating the replica means a HUGE aborted
transaction - and Slony should TRUNCATE the table, getting rid of dead
tuples, but that is a subject for another list).

so I did vacuum full verbose analyze (does it make sense ?)

This hanged on a (quite large) table:

INFO:  vacuuming "public.calls"
INFO:  "calls": found 7980456 removable, 3989705 nonremovable row versions
in 296943 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 136 to 224 bytes long.
There were 891 unused item pointers.
Total free space (including removable row versions) is 1594703944 bytes.
197958 pages are or will become empty, including 0 at the end of the table.
212719 pages containing 1588415680 free bytes are potential move
destinations.
CPU 7.25s/3.28u sec elapsed 144.95 sec.
INFO:  index "calls_pkey" now contains 3989705 row versions in 8975 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.35s/0.59u sec elapsed 39.03 sec.
INFO:  index "calls_cli" now contains 3989705 row versions in 13504 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.51s/0.60u sec elapsed 58.60 sec.
INFO:  index "calls_dnis" now contains 3989705 row versions in 13600 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.60s/0.90u sec elapsed 27.05 sec.
INFO:  index "calls_u" now contains 3989705 row versions in 23820 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.92s/0.78u sec elapsed 80.51 sec.
INFO:  index "calls_z" now contains 3989705 row versions in 13607 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.60s/0.85u sec elapsed 39.77 sec.

It was hanging in this state for more than 3 hours, and I had to kill the
vacuum process.

>From iostat I saw that there was continuous write activity, steadilly about
1.3 MB/s (the disk system can do about 40 MB/s), and there were iowait
processes. There was no read activity.

There were no other clients for that database (but there were clients in
other databases in the instance).

version is 8.1.0 . Autovacuum is off. I upped maintenance_work_mem to 512 MB
. Any hints? If nothing comes up today, I am scratching that replica.


telefony=# \d calls
   Table "public.calls"
Column |Type |
Modifiers
---+-+--
--
 dt| timestamp without time zone |
 machine_ip| integer |
 port  | integer |
 filename  | character varying(15)   |
 account   | character(11)   |
 duration  | integer |
 ani   | character(32)   |
 application   | character(32)   |
 dnis  | integer |
 z | integer |
 client| integer |
 taryfa| integer |
 operator  | character varying(20)   |
 id| integer | not null default
nextval(('seq_calls_id'::text)::regclass)
 outgoing  | character(12)   |
 release_cause | text|
 waiting   | integer |
 oper_pin  | integer |
Indexes:
"calls_pkey" PRIMARY KEY, btree (id)
"calls_u" UNIQUE, btree (dt, dnis, port, machine_ip, account)
"calls_cli" btree (client, dt)
"calls_dnis" btree (dnis, dt)
"calls_z" btree (z, dt)
Triggers:
_ctele_denyaccess_5 BEFORE INSERT OR DELETE OR UPDATE ON calls FOR EACH
ROW EXECUTE PROCEDURE _ctele.denyaccess('_ctele')


Pozdrawiam
Marcin Mańk


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] VACUUM FULL hangs

2006-03-15 Thread Richard Huxton

Marcin Mańk wrote:

Hello list.

I recently tried to do a slony replica of my database, and doing it falied.
I retried, and then it succeeded (why it failed is another story).

This caused that in the replica there is a lot of dead tuples ( If i
understand correctly, a failure in creating the replica means a HUGE aborted
transaction - and Slony should TRUNCATE the table, getting rid of dead
tuples, but that is a subject for another list).

so I did vacuum full verbose analyze (does it make sense ?)


Fair enough. If you want empty tables TRUNCATE is probably a better bet 
though.



This hanged on a (quite large) table:

INFO:  vacuuming "public.calls"
INFO:  "calls": found 7980456 removable, 3989705 nonremovable row versions
in 296943 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 136 to 224 bytes long.
There were 891 unused item pointers.
Total free space (including removable row versions) is 1594703944 bytes.
197958 pages are or will become empty, including 0 at the end of the table.
212719 pages containing 1588415680 free bytes are potential move
destinations.


OK, so there are 7.9 million removable rows and 3.9 million nonremovable 
so truncate isn't an option since you have data you presumably want to 
keep. It estimates about 200,000 pages will become empty, but none of 
them are at the end of the table. This represents 1.5GB of unused 
disk-space.


I'm a bit puzzled as to how you managed to get so much free space at the 
start of the table. Did the replication work on the second try?



CPU 7.25s/3.28u sec elapsed 144.95 sec.
INFO:  index "calls_pkey" now contains 3989705 row versions in 8975 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.35s/0.59u sec elapsed 39.03 sec.
INFO:  index "calls_cli" now contains 3989705 row versions in 13504 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.51s/0.60u sec elapsed 58.60 sec.
INFO:  index "calls_dnis" now contains 3989705 row versions in 13600 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.60s/0.90u sec elapsed 27.05 sec.
INFO:  index "calls_u" now contains 3989705 row versions in 23820 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.92s/0.78u sec elapsed 80.51 sec.
INFO:  index "calls_z" now contains 3989705 row versions in 13607 pages
DETAIL:  108927 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.60s/0.85u sec elapsed 39.77 sec.


It's done all the indexes (and seems to have done them quite quickly), 
and is presumably working on the data now.



It was hanging in this state for more than 3 hours, and I had to kill the
vacuum process.


From iostat I saw that there was continuous write activity, steadilly about

1.3 MB/s (the disk system can do about 40 MB/s), and there were iowait
processes. There was no read activity.

There were no other clients for that database (but there were clients in
other databases in the instance).


OK, so you might well be getting the vacuum writing one page, then WAL, 
then vacuum, etc. That will mean the disk spends most of its time 
seeking back and fore. How many disks do you have, and is the WAL on a 
separate set of disks?


I think it's just taking a long time because you have so many pages to 
move and not enough disk bandwidth. Of course the root of the problem is 
that you had so many dead rows after a failed replication, but you're 
right and that's another email.


--
  Richard Huxton
  Archonet Ltd


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] BETWEEN optimizer problems with single-value range

2006-03-15 Thread Andreas Kretschmer
Kevin Grittner <[EMAIL PROTECTED]> schrieb:

> Attached is a simplified example of a performance problem we have seen,

Odd. Can you tell us your PG-Version?


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."(unknow)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] BETWEEN optimizer problems with single-value range

2006-03-15 Thread Kevin Grittner
Attached is a simplified example of a performance problem we have seen,
with a workaround and a suggestion for enhancement (hence both the
performance and hackers lists).

Our software is allowing users to specify the start and end dates for a
query.  When they enter the same date for both, the optimizer makes a
very bad choice.  We can work around it in application code by using an
equality test if both dates match.  I think the planner should be able
to make a better choice here.  (One obvious way to fix it would be to
rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it seems
like there is some underlying problem which should be fixed instead (or
in addition to) this.

The first query uses BETWEEN with the same date for both min and max
values.  The second query uses an equality test for the same date.  The
third query uses BETWEEN with a two-day range.  In all queries, there
are less than 4,600 rows for the specified cotfcNo value out of over 18
million rows in the table.  We tried boosting the statistics samples for
the columns in the selection, which made the estimates of rows more
accurate, but didn't change the choice of plans.

-Kevin




between-optimization-problem.txt
Description: Binary data

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] VACUUM FULL hangs

2006-03-15 Thread Marcin Mańk

> I'm a bit puzzled as to how you managed to get so much free space at the
> start of the table. Did the replication work on the second try?

It actually worked on third try, I guess.

> OK, so you might well be getting the vacuum writing one page, then WAL,
> then vacuum, etc. That will mean the disk spends most of its time
> seeking back and fore. How many disks do you have, and is the WAL on a
> separate set of disks?

It is 2 spindles software RAID1 . Till now there were no performance
problems with this machine that would mandate trying anything more fancy,
this machine is low traffic.

Greetings
Marcin Mańk


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] BETWEEN optimizer problems with single-value

2006-03-15 Thread Kevin Grittner
>>> On Wed, Mar 15, 2006 at 12:17 pm, in message
<[EMAIL PROTECTED]>, Andreas Kretschmer
<[EMAIL PROTECTED]> wrote: 
> Kevin Grittner <[EMAIL PROTECTED]> schrieb:
> 
>> Attached is a simplified example of a performance problem we have
seen,
> 
> Odd. Can you tell us your PG- Version?

I know we really should move to 8.1.3, but I haven't gotten to it yet. 
We're on a build from the 8.1 stable branch as of February 10th, with a
patch to allow ANSI standard interpretation of string literals.  (So
this is 8.1.2 with some 8.1.3 changes plus the string literal patch.)

If there are any changes in that time frame which might affect this
issue, I could deploy a standard release and make sure that I see the
same behavior.  Let me know.

-Kevin



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] BETWEEN optimizer problems with single-value

2006-03-15 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
>> Odd. Can you tell us your PG- Version?

> this is 8.1.2 with some 8.1.3 changes plus the string literal patch.)

8.1 is certainly capable of devising the plan you want, for example
in the regression database:

regression=# explain select * from tenk1 where thousand = 10 and tenthous 
between 42 and 144;
 QUERY PLAN

 Index Scan using tenk1_thous_tenthous on tenk1  (cost=0.00..6.01 rows=1 
width=244)
   Index Cond: ((thousand = 10) AND (tenthous >= 42) AND (tenthous <= 144))
(2 rows)

It looks to me like this is a matter of bad cost estimation, ie, it's
thinking the other index is cheaper to use.  Why that is is not clear.
Can we see the pg_stats rows for ctofcNo and calDate?

Also, try to force it to generate the plan you want, so we can see what
it thinks the cost is for that.  If you temporarily drop the wrong index
you should be able to get there:

begin;
drop index  "Cal_CalDate";
explain analyze select ... ;
-- repeat as needed if it chooses some other wrong index
rollback;

I hope you have a play copy of the database to do this in ---
although it would be safe to do the above in a live DB, the DROP would
exclusive-lock the table until you finish the experiment and rollback,
which probably is not good for response time ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Slow SELECTS after large update cycle

2006-03-15 Thread Jan de Visser
Hello,

After fixing the hanging problems I reported here earlier (by uninstalling 
W2K3 SP1), I'm running into another weird one.

After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), 
the first 'reporting' type query on tables involved in that write cycle is 
very slow. As an example, I have a query which according to EXPLAIN ANALYZE 
takes about 1.1s taking 46s. After this one hit, everything is back to 
normal, and subsequent executions of the same query are in fact subsecond. 
Restarting the appserver and pgsql does not make the slowness re-appear, only 
running another batch will.

During the 'write'/batch cycle, a large number of rows in various tables are 
inserted and subsequently (repeatedly) updated. The reporting type queries 
after that are basically searches on those tables.

Anybody any ideas?

Thanks,

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Background writer configuration

2006-03-15 Thread Kevin Grittner
We were seeing clusters of query timeouts with our web site, which were
corrected by adjusting the configuration of the background writer.  I'm
posting just to provide information which others might find useful -- I
don't have any problem I'm trying to solve in this regard.

The web site gets 1 to 2 million hits per day, with about the same
number of select queries run to provide data for the web pages.  The
load is distributed across multiple databases.  (We have four, but the
load is easily handled by any two of them, and we often take one or two
out of web use for maintenance or special statistical runs.)  Each
database gets the same stream of modification requests -- about 2.7
million database transactions per day.  Each transaction can contain
multiple inserts, updates, or deletes.  The peak times for both the web
requests and the data modifications are in the afternoon on business
days.  Most web queries run under a timeout limit of 20 seconds.

During peak times, we would see clusters of timeouts (where queries
exceeded the 20 second limit) on very simple queries which normally run
in a few milliseconds.  The pattern suggested that checkpoints were at
fault.  I boosted the settings for the background writer from the
defaults to the values below, and we saw a dramatic reduction in these
timeouts.  We also happened to have one machine which had been out of
the replication mix which was in "catch up" mode, processing the
transaction stream as fast as the database could handle it, without any
web load.  We saw the transaction application rate go up by a factor of
four when I applied these changes:

bgwriter_lru_percent = 2.0
bgwriter_lru_maxpages = 250
bgwriter_all_percent = 1.0
bgwriter_all_maxpages = 250

This was with shared_buffers = 2, so that last value was
effectively limited to 200 by the percentage.

I then did some calculations, based on the sustained write speed of our
drive array (as measured by copying big files to it), and we tried
this:

bgwriter_lru_percent = 20.0
bgwriter_lru_maxpages = 200
bgwriter_all_percent = 10.0
bgwriter_all_maxpages = 600

This almost totally eliminated the clusters of timeouts, and caused the
transaction application rate to increase by a factor of eight over the
already-improved speed.  (That is, we were running 30 to 35 times as
many transactions per minute into the database, compared to the default
background writer configuration.)  I'm going to let these settings
settle in for a week or two before we try adjusting them further (to see
if we can eliminate those last few timeouts of this type).

I guess my point is that people shouldn't be shy about boosting these
numbers by a couple orders of magnitude from the default values.  It may
also be worth considering whether the defaults should be something more
aggressive.

-Kevin


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Background writer configuration

2006-03-15 Thread Joshua D. Drake

> I then did some calculations, based on the sustained write speed of our
> drive array (as measured by copying big files to it), and we tried
> this:
>
> bgwriter_lru_percent = 20.0
> bgwriter_lru_maxpages = 200
> bgwriter_all_percent = 10.0
> bgwriter_all_maxpages = 600
>
> This almost totally eliminated the clusters of timeouts, and caused the
> transaction application rate to increase by a factor of eight over the
> already-improved speed.  (That is, we were running 30 to 35 times as
> many transactions per minute into the database, compared to the default
> background writer configuration.)  I'm going to let these settings
> settle in for a week or two before we try adjusting them further (to see
> if we can eliminate those last few timeouts of this type).


Can you tell us what type of array you have?

Joshua D. Drake

>
> I guess my point is that people shouldn't be shy about boosting these
> numbers by a couple orders of magnitude from the default values.  It may
> also be worth considering whether the defaults should be something more
> aggressive.
>
> -Kevin
>
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

-- 
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] BETWEEN optimizer problems with single-value range

2006-03-15 Thread Merlin Moncure
On 3/15/06, Kevin Grittner <[EMAIL PROTECTED]> wrote:
> Attached is a simplified example of a performance problem we have seen,
> with a workaround and a suggestion for enhancement (hence both the
> performance and hackers lists).


Hi Kevin.  In postgres 8.2 you will be able to use the row-wise
comparison for your query which  should guarantee good worst case
performance without having to maintain two separate query forms.  it
is also a more elegant syntax as you will see.

SELECT "CA"."calDate", "CA"."startTime"
  FROM "Cal" "CA"
  WHERE ("CA"."ctofcNo", "CA"."calDate") BETWEEN
(2192, '2006-03-15') and (2192, '2006-03-15')
  ORDER BY "ctofcNo", "calDate", "startTime";

Be warned this will not work properly in pg < 8.2.  IMO, row-wise is
the best way to write this type of a query. Please note the row
constructor and the addition of ctofcNo into the order by clause to
force use of the index.

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] BETWEEN optimizer problems with single-value

2006-03-15 Thread Kevin Grittner
>>> On Wed, Mar 15, 2006 at  1:17 pm, in message
<[EMAIL PROTECTED]>,
Tom Lane <[EMAIL PROTECTED]> wrote: 
> 
> 8.1 is certainly capable of devising the plan you want, for example
> in the regression database:
> 
> regression=# explain select * from tenk1 where thousand = 10 and
tenthous 
> between 42 and 144;
>  QUERY PLAN
>

>  Index Scan using tenk1_thous_tenthous on tenk1  (cost=0.00..6.01
rows=1 
> width=244)
>Index Cond: ((thousand = 10) AND (tenthous >= 42) AND (tenthous <=
144))
> (2 rows)

That matches one of the examples where it optimized well.  I only saw
the bad plan when low and high ends of the BETWEEN range were equal.

> It looks to me like this is a matter of bad cost estimation, ie,
it's
> thinking the other index is cheaper to use.  Why that is is not
clear.
> Can we see the pg_stats rows for ctofcNo and calDate?

 schemaname | tablename | attname | null_frac | avg_width | n_distinct
|most_common_vals   
 |  
   most_common_freqs  | 
histogram_bounds
 | correlation
+---+-+---+---++-+-++-
 public | Cal   | calDate | 0 | 4 |   2114
|
{2003-06-02,2000-06-20,2001-04-16,2003-06-17,2003-12-01,2004-10-12,2001-04-23,2001-10-15,2002-03-06,2002-05-03}
|
{0.0033,0.0023,0.0023,0.0023,0.0023,0.0023,0.002,0.002,0.002,0.002}
|
{1986-03-14,1999-06-11,2000-07-14,2001-05-18,2002-03-21,2002-12-04,2003-08-12,2004-05-13,2005-02-01,2005-09-28,2080-12-31}
|   0.0545768
 public | Cal   | ctofcNo | 0 | 8 |669
| {0793,1252,1571,0964,0894,1310,"DA  ",0944,1668,0400} 
 |
{0.024,0.019,0.015,0.012,0.012,0.011,0.0106667,0.01,0.0097,0.0087}
 | {,0507,0733,0878,1203,1336,14AG,1633,1971,3705,YVJO} 
 | 
-0.0179665
(2 rows)


> Also, try to force it to generate the plan you want, so we can see
what
> it thinks the cost is for that.  If you temporarily drop the wrong
index
> you should be able to get there:
> 
>   begin;
>   drop index  "Cal_CalDate";
>   explain analyze select ... ;
>   --  repeat as needed if it chooses some other wrong index
>   rollback;

 Sort  (cost=4.03..4.03 rows=1 width=12) (actual time=48.484..48.486
rows=4 loops=1)
   Sort Key: "calDate", "startTime"
   ->  Index Scan using "Cal_CtofcNo" on "Cal" "CA"  (cost=0.00..4.02
rows=1 width=12) (actual time=36.750..48.228 rows=4 loops=1)
 Index Cond: ((("ctofcNo")::bpchar = '2192'::bpchar) AND
(("calDate")::date >= '2006-03-15'::date) AND (("calDate")::date <=
'2006-03-15'::date))
 Total runtime: 56.616 ms


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Background writer configuration

2006-03-15 Thread Kevin Grittner
>>> On Wed, Mar 15, 2006 at  1:54 pm, in message
<[EMAIL PROTECTED]>, "Joshua D. Drake"
<[EMAIL PROTECTED]> wrote: 

>> I then did some calculations, based on the sustained write speed of
our
>> drive array (as measured by copying big files to it), and we tried
>> this:
>>
>> bgwriter_lru_percent = 20.0
>> bgwriter_lru_maxpages = 200
>> bgwriter_all_percent = 10.0
>> bgwriter_all_maxpages = 600
>>
>> This almost totally eliminated the clusters of timeouts, and caused
the
>> transaction application rate to increase by a factor of eight over
the
>> already- improved speed.  (That is, we were running 30 to 35 times
as
>> many transactions per minute into the database, compared to the
default
>> background writer configuration.)  I'm going to let these settings
>> settle in for a week or two before we try adjusting them further (to
see
>> if we can eliminate those last few timeouts of this type).
> 
> 
> Can you tell us what type of array you have?

Each machine has a RAID5 array of 13 (plus one hot spare)
15,000 RPM Ultra 320 SCSI drives
2 machines using IBM ServRaid6M battery backed caching controllers
2 machines using IBM ServRaid4MX battery backed caching controllers



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] BETWEEN optimizer problems with single-value range

2006-03-15 Thread Andreas Kretschmer
Merlin Moncure <[EMAIL PROTECTED]> schrieb:

> On 3/15/06, Kevin Grittner <[EMAIL PROTECTED]> wrote:
> > Attached is a simplified example of a performance problem we have seen,
> > with a workaround and a suggestion for enhancement (hence both the
> > performance and hackers lists).
> 
> 
> Hi Kevin.  In postgres 8.2 you will be able to use the row-wise

8.2? AFAIK, Feature freeze in juni/juli this year...
Release august/september.


> comparison for your query which  should guarantee good worst case
> performance without having to maintain two separate query forms.  it

Perhaps, a bitmap index scan (since 8.1) are useful for such querys.
Thats why i asked which version.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."(unknow)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] BETWEEN optimizer problems with single-value range

2006-03-15 Thread Merlin Moncure
On 3/15/06, Andreas Kretschmer <[EMAIL PROTECTED]> wrote:
> Merlin Moncure <[EMAIL PROTECTED]> schrieb:
>
> > On 3/15/06, Kevin Grittner <[EMAIL PROTECTED]> wrote:
> > > Attached is a simplified example of a performance problem we have seen,
> > > with a workaround and a suggestion for enhancement (hence both the
> > > performance and hackers lists).
> >
> >
> > Hi Kevin.  In postgres 8.2 you will be able to use the row-wise
>
> 8.2? AFAIK, Feature freeze in juni/juli this year...
> Release august/september.

yes, but I was addressing kevin's point about enhancing the server...

> > comparison for your query which  should guarantee good worst case
> > performance without having to maintain two separate query forms.  it
>
> Perhaps, a bitmap index scan (since 8.1) are useful for such querys.
> Thats why i asked which version.

I think you will find that reading a range of records from a table
ordered by an index utilizing the 8.2 comparison feature is much
faster than a bitmap index scan.

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] BETWEEN optimizer problems with single-value range

2006-03-15 Thread Simon Riggs
On Wed, 2006-03-15 at 11:56 -0600, Kevin Grittner wrote:
> Attached is a simplified example of a performance problem we have seen,
> with a workaround and a suggestion for enhancement (hence both the
> performance and hackers lists).
> 
> Our software is allowing users to specify the start and end dates for a
> query.  When they enter the same date for both, the optimizer makes a
> very bad choice.  We can work around it in application code by using an
> equality test if both dates match.  I think the planner should be able
> to make a better choice here. 

> (One obvious way to fix it would be to
> rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it seems
> like there is some underlying problem which should be fixed instead (or
> in addition to) this.

That might work, but I'm not sure if that is in itself the problem and
it would be mostly wasted overhead in 99% of cases.

The main issue appears to be that the planner chooses "Cal_CalDate"
index rather than "Cal_CtofcNo" index when the BETWEEN values match. 

It seems that the cost of the first and third EXPLAINs is equal, yet for
some reason it chooses different indexes in each case. My understanding
was that it would pick the first index created if plan costs were equal.
Is that behaviour repeatable with each query?

ISTM that if we have equal plan costs then we should be choosing the
index for which we have more leading columns, since that is more likely
to lead to a more selective answer. But the plan selection is a simple
"pick the best, or if they're equal pick the best sort order".

> The first query uses BETWEEN with the same date for both min and max
> values.  The second query uses an equality test for the same date.  The
> third query uses BETWEEN with a two-day range.  In all queries, there
> are less than 4,600 rows for the specified cotfcNo value out of over 18
> million rows in the table.  We tried boosting the statistics samples for
> the columns in the selection, which made the estimates of rows more
> accurate, but didn't change the choice of plans.

The selectivity seems the same in both - clamped to a minimum of 1 row,
so changing that doesn't look like it would help.

Best Regards, Simon Riggs





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow SELECTS after large update cycle

2006-03-15 Thread Chris

Jan de Visser wrote:

Hello,

After fixing the hanging problems I reported here earlier (by uninstalling 
W2K3 SP1), I'm running into another weird one.


After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), 
the first 'reporting' type query on tables involved in that write cycle is 
very slow. As an example, I have a query which according to EXPLAIN ANALYZE 
takes about 1.1s taking 46s. After this one hit, everything is back to 
normal, and subsequent executions of the same query are in fact subsecond. 
Restarting the appserver and pgsql does not make the slowness re-appear, only 
running another batch will.


During the 'write'/batch cycle, a large number of rows in various tables are 
inserted and subsequently (repeatedly) updated. The reporting type queries 
after that are basically searches on those tables.


After a large batch you need to run 'analyze' over the tables involved 
to get postgresql to update it's statistics so it can work out which 
indexes etc it should use.


--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow SELECTS after large update cycle

2006-03-15 Thread Simon Riggs
On Wed, 2006-03-15 at 14:39 -0500, Jan de Visser wrote:

> After fixing the hanging problems I reported here earlier (by uninstalling 
> W2K3 SP1), I'm running into another weird one.
> 
> After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), 
> the first 'reporting' type query on tables involved in that write cycle is 
> very slow. As an example, I have a query which according to EXPLAIN ANALYZE 
> takes about 1.1s taking 46s. After this one hit, everything is back to 
> normal, and subsequent executions of the same query are in fact subsecond. 
> Restarting the appserver and pgsql does not make the slowness re-appear, only 
> running another batch will.
> 
> During the 'write'/batch cycle, a large number of rows in various tables are 
> inserted and subsequently (repeatedly) updated. The reporting type queries 
> after that are basically searches on those tables.
> 
> Anybody any ideas?

This is caused by updating the commit status hint bits on each row
touched by the SELECTs. This turns the first SELECT into a write
operation.

Try running a scan of the whole table to take the hit before you give it
back to the users.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] BETWEEN optimizer problems with single-value

2006-03-15 Thread Kevin Grittner
>>> On Wed, Mar 15, 2006 at  5:05 pm, in message
<[EMAIL PROTECTED]>, Simon Riggs
<[EMAIL PROTECTED]> wrote: 
> On Wed, 2006- 03- 15 at 11:56 - 0600, Kevin Grittner wrote:
> 
>> (One obvious way to fix it would be to
>> rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it
seems
>> like there is some underlying problem which should be fixed instead
(or
>> in addition to) this.
> 
> That might work, but I'm not sure if that is in itself the problem
and
> it would be mostly wasted overhead in 99% of cases.

It sounds like we agree.

> The main issue appears to be that the planner chooses "Cal_CalDate"
> index rather than "Cal_CtofcNo" index when the BETWEEN values match.


Agreed.

> It seems that the cost of the first and third EXPLAINs is equal, yet
for
> some reason it chooses different indexes in each case. My
understanding
> was that it would pick the first index created if plan costs were
equal.
> Is that behaviour repeatable with each query?

It seems to be a consistent pattern, although strictly speaking our
evidence is anecdotal.  We've got hundreds of known failures with the
BETWEEN variant on equal dates and no known successes.  We have a few
dozen tests of the equality variant with 100% success in those tests.

> ISTM that if we have equal plan costs then we should be choosing the
> index for which we have more leading columns, since that is more
likely
> to lead to a more selective answer. But the plan selection is a
simple
> "pick the best, or if they're equal pick the best sort order".

> The selectivity seems the same in both -  clamped to a minimum of 1
row,
> so changing that doesn't look like it would help.

The fact that it costs these as equivalent is surprising in itself, and
might be worth examining.  This might be an example of something I
suggested a while ago -- that the rounding a row estimate to an integer
on the basis that "you can't read half a row" is not necessarily wise,
because you can have a 50% chance of reading a row versus a higher or
lower percentage.

-Kevin



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] BETWEEN optimizer problems with single-value

2006-03-15 Thread Simon Riggs
On Wed, 2006-03-15 at 14:17 -0500, Tom Lane wrote:

> It looks to me like this is a matter of bad cost estimation, ie, it's
> thinking the other index is cheaper to use.  Why that is is not clear.
> Can we see the pg_stats rows for ctofcNo and calDate?

ISTM that when the BETWEEN constants match we end up in this part of
clauselist_selectivity()...



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] BETWEEN optimizer problems with single-value

2006-03-15 Thread Simon Riggs
On Thu, 2006-03-16 at 00:07 +, Simon Riggs wrote:
> On Wed, 2006-03-15 at 14:17 -0500, Tom Lane wrote:
> 
> > It looks to me like this is a matter of bad cost estimation, ie, it's
> > thinking the other index is cheaper to use.  Why that is is not clear.
> > Can we see the pg_stats rows for ctofcNo and calDate?
> 
> ISTM that when the BETWEEN constants match we end up in this part of
> clauselist_selectivity()...

(and now for the whole email...)

/*
 * It's just roundoff error; use a small positive
 * value
 */
s2 = 1.0e-10;

so that the planner underestimates the cost of using "Cal_CalDate" so
that it ends up the same as "Cal_CtofcNo", and then we pick
"Cal_CalDate" because it was created first.

Using 1.0e-10 isn't very useful... the selectivity for a range should
never be less than the selectivity for an equality, so we should simply
put in a test against one of the pseudo constants and use that as the
minimal value. That should lead to raising the apparent cost of
Cal_CalDate so that Cal_CtofcNo can take precedence.

Best Regards, Simon Riggs






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow SELECTS after large update cycle

2006-03-15 Thread Jan de Visser
On Wednesday 15 March 2006 18:21, Simon Riggs wrote:
> On Wed, 2006-03-15 at 14:39 -0500, Jan de Visser wrote:
> > After fixing the hanging problems I reported here earlier (by
> > uninstalling W2K3 SP1), I'm running into another weird one.
> >
> > After doing a +/- 8hr cycle of updates and inserts (what we call a
> > 'batch'), the first 'reporting' type query on tables involved in that
> > write cycle is very slow. As an example, I have a query which according
> > to EXPLAIN ANALYZE takes about 1.1s taking 46s. After this one hit,
> > everything is back to normal, and subsequent executions of the same query
> > are in fact subsecond. Restarting the appserver and pgsql does not make
> > the slowness re-appear, only running another batch will.
> >
> > During the 'write'/batch cycle, a large number of rows in various tables
> > are inserted and subsequently (repeatedly) updated. The reporting type
> > queries after that are basically searches on those tables.
> >
> > Anybody any ideas?
>
> This is caused by updating the commit status hint bits on each row
> touched by the SELECTs. This turns the first SELECT into a write
> operation.
>
> Try running a scan of the whole table to take the hit before you give it
> back to the users.

Thanks Simon. I didn't know about the cause, but I expected the answer to be 
'deal with it', as it is. At least I can explain it now...

>
> Best Regards, Simon Riggs

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] BETWEEN optimizer problems with single-value

2006-03-15 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
>> ISTM that when the BETWEEN constants match we end up in this part of
>> clauselist_selectivity()...

Yeah, I think you are right.

> so that the planner underestimates the cost of using "Cal_CalDate" so
> that it ends up the same as "Cal_CtofcNo", and then we pick
> "Cal_CalDate" because it was created first.

No, it doesn't end up the same --- but the difference is small enough to
be in the roundoff-error regime.  The real issue here is that we're
effectively assuming that one row will be fetched from the index in both
cases, and this is clearly not the case for the Cal_CalDate index.  So
we need a more accurate estimate for the boundary case.

> Using 1.0e-10 isn't very useful... the selectivity for a range should
> never be less than the selectivity for an equality, so we should simply
> put in a test against one of the pseudo constants and use that as the
> minimal value.

That's easier said than done, because you'd first have to find the
appropriate equality operator to use (ie, one having semantics that
agree with the inequality operators).  Another point is that the above
statement is simply wrong, consider
calDate BETWEEN '2006-03-15' AND '2006-03-14'
for which an estimate of zero really is correct.

Possibly we could drop this code's reliance on seeing
SCALARLTSEL/SCALARGTSEL as the estimators, and instead try to locate a
common btree opclass for the operators --- which would then let us
identify the right equality operator to use, and also let us distinguish
> from >= etc.  If we're trying to get the boundary cases right I
suspect we have to account for that.  I could see such an approach being
tremendously slow though :-(, because we'd go looking for btree
opclasses even for operators that have nothing to do with < or >.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq