Re: [PERFORM] Simple join optimized badly?

2006-10-11 Thread Bucky Jordan
> Brian Herlihy <[EMAIL PROTECTED]> writes:
> > What would it take for hints to be added to postgres?
> 
> A *whole lot* more thought and effort than has been expended on the
> subject to date.
> 
> Personally I have no use for the idea of "force the planner to do
> exactly X given a query of exactly Y".  You don't have exactly Y
> today, tomorrow, and the day after (if you do, you don't need a
> hint mechanism at all, you need a mysql-style query cache).
> IMHO most of the planner mistakes we see that could be fixed via
> hinting are really statistical estimation errors, and so the right
> level to be fixing them at is hints about how to estimate the number
> of rows produced for given conditions.  Mind you that's still a plenty
> hard problem, but you could at least hope that a hint of that form
> would be useful for more than one query.
> 

Do I understand correctly that you're suggesting it might not be a bad
idea to allow users to provide statistics?

Is this along the lines of "I'm loading a big table and touching every
row of data, so I may as well collect some stats along the way" and "I
know my data contains these statistical properties, but the analyzer
wasn't able to figure that out (or maybe can't figure it out efficiently
enough)"?

While it seems like this would require more knowledge from the user
(e.g. more about their data, how the planner works, and how it uses
statistics) this would actually be helpful/required for those who really
care about performance. I guess it's the difference between a tool
advanced users can get long term benefit from, or a quick fix that will
probably come back to bite you. I've been pleased with Postgres'
thoughtful design; recently I've been doing some work with MySQL, and
can't say I feel the same way.

Also, I'm guessing this has already come up at some point, but what
about allowing PG to do some stat collection during queries? If you're
touching a lot of data (such as an import process) wouldn't it be more
efficient (and perhaps more accurate) to collect stats then, rather than
having to re-scan? It would be nice to be able to turn this on/off on a
per query basis, seeing as it could have pretty negative impacts on OLTP
performance...

- Bucky

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Simple join optimized badly?

2006-10-11 Thread Heikki Linnakangas

Bucky Jordan wrote:


Is this along the lines of "I'm loading a big table and touching every
row of data, so I may as well collect some stats along the way" and "I
know my data contains these statistical properties, but the analyzer
wasn't able to figure that out (or maybe can't figure it out efficiently
enough)"?

While it seems like this would require more knowledge from the user
(e.g. more about their data, how the planner works, and how it uses
statistics) this would actually be helpful/required for those who really
care about performance. ...


The user would have to know his data, but he wouldn't need to know how 
the planner works. While with hints like "use index X", he *does* need 
to know how the planner works.


Being able to give hints about statistical properties of relations and 
their relationships seems like a good idea to me. And we can later 
figure out ways to calculate them automatically.


BTW, in DB2 you can declare a table as volatile, which means that the 
cardinality of the table varies greatly. The planner favors index scans 
on volatile tables.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple join optimized badly?

2006-10-11 Thread Bruce Momjian
Heikki Linnakangas wrote:
> BTW, in DB2 you can declare a table as volatile, which means that the 
> cardinality of the table varies greatly. The planner favors index scans 
> on volatile tables.

Now that seems like a valuable idea.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Simple join optimized badly?

2006-10-11 Thread Mark Lewis
Tom,

I'm interested in the problem of cross-column statistics from a
theoretical perspective.  It would be interesting to sit down and try to
reason out a useful solution, or at very least to understand the problem
better so I can anticipate when it might come and eat me.

>From my understanding, the main problem is that if PG knows the
selectivity of n conditions C1,C2,...,Cn then it doesn't know whether
the combined selectivity will be C1*C2*...*Cn (conditions are
independent) or max(C1,C2,...,Cn) (conditions are strictly dependent),
or somewhere in the middle. Therefore, row estimates could be orders of
magnitude off.

I suppose a common example would be a table with a serial primary key
column and a timestamp value which is always inserted as
CURRENT_TIMESTAMP, so the two columns are strongly correlated.  If the
planner guesses that 1% of the rows of the table will match pk>100,
and 1% of the rows of the table will match timestamp > X, then it would
be nice for it to know that if you specify both "pk>100 AND
timestamp>X" that the combined selectivity is still only 1% and not 1% *
1% = 0.01%.

As long as I'm sitting down and reasoning about the problem anyway, are
there any other types of cases you're aware of where some form of cross-
column statistics would be useful?  In the unlikely event that I
actually come up with a brilliant and simple solution, I'd at least like
to make sure that I'm solving the right problem :)

Thanks,
Mark Lewis



On Tue, 2006-10-10 at 22:38 -0400, Tom Lane wrote:
> Brian Herlihy <[EMAIL PROTECTED]> writes:
> > What would it take for hints to be added to postgres?
> 
> A *whole lot* more thought and effort than has been expended on the
> subject to date.
> 
> Personally I have no use for the idea of "force the planner to do
> exactly X given a query of exactly Y".  You don't have exactly Y
> today, tomorrow, and the day after (if you do, you don't need a
> hint mechanism at all, you need a mysql-style query cache).
> IMHO most of the planner mistakes we see that could be fixed via
> hinting are really statistical estimation errors, and so the right
> level to be fixing them at is hints about how to estimate the number
> of rows produced for given conditions.  Mind you that's still a plenty
> hard problem, but you could at least hope that a hint of that form
> would be useful for more than one query.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Scrub one large table against another

2006-10-11 Thread Brendan Curran

Tom Lane wrote:

Brendan Curran <[EMAIL PROTECTED]> writes:
So much time is being spent in the Unique and Sort leaves... I would 
think that it wouldn't need to do the unique portion, since there is no 
DISTINCT clause...


There's nothing in that query suggesting that suppress.email is unique.
If you know that it is, try using a plain join instead of an IN.

regards, tom lane




Interestingly, and thank you to Tom and Jim, the explicit JOIN improved 
performance tremendously (RESULTS BELOW). I converted the entire query 
to use explicit joins instead of IN and EXISTS and discovered acceptable 
performance. I think the next place to go from here is RAID1/RAID10 and 
possibly partitioning my large table (Welcome to DDL insanity, right?).


I have to add that I'm a little surprised the documentation is so 
generous to IN and EXISTS. Is there something amiss in my configuration 
that prevents them from performing correctly? If not, I can't imagine a 
time when IN or EXISTS would be more performant than an explicit JOIN...


Additionally, I manually scrub for duplicates at the group level in the 
email_record table to keep my records unique. I would like to use a 
unique constraint, but have found that batching in JDBC is impossible 
due to irrecoverable errors even when using BEFORE INSERT triggers to 
just return NULL if a record exists already. Has anyone got an elegant 
solution for the 'add only if not exists already' problem similar to 
MSSQL's MERGE command?


Just one more thing... I have found that maintaining a btree index on a 
varchar(255) value is extremely expensive on insert/update/delete. It is 
unfortunately necessary for me to maintain this index for queries and 
reports so I am transitioning to using an unindexed staging table to 
import data into before merging it with the larger table. All the docs 
and posts recommend is to drop the index, import your data, and then 
create the index again. This is untenable on a daily / bi-weekly basis. 
Is there a more elegant solution to this indexing problem?


Thank you for all of your help!

EXPLAIN ANALYZE result comparison...

1. EXPLAIN ANALYZE SELECT email_record_id from ONLY email_record er
WHERE email_list_id = 13 AND email IN
(select email from suppress);

Hash Join  (cost=8359220.68..9129843.00 rows=800912 width=8) (actual 
time=2121601.603..2121601.603 rows=0 loops=1)

  Hash Cond: (("outer".email)::text = ("inner".email)::text)
  ->  Unique  (cost=4414093.19..4522324.49 rows=21646260 width=25) 
(actual time=1165955.907..1434439.731 rows=21646261 loops=1)
->  Sort  (cost=4414093.19..4468208.84 rows=21646260 width=25) 
(actual time=1165955.903..1384667.715 rows=21646261 loops=1)

  Sort Key: suppress.email
  ->  Seq Scan on suppress  (cost=0.00..393024.60 
rows=21646260 width=25) (actual time=37.784..609848.551 rows=21646261 
loops=1)
  ->  Hash  (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual 
time=554522.983..554522.983 rows=3245336 loops=1)
->  Bitmap Heap Scan on email_record er 
(cost=38464.83..3899868.47 rows=4606808 width=32) (actual 
time=275640.435..541342.727 rows=3245336 loops=1)

  Recheck Cond: (email_list_id = 13)
  ->  Bitmap Index Scan on list  (cost=0.00..38464.83 
rows=4606808 width=0) (actual time=275102.037..275102.037 rows=5172979 
loops=1)

Index Cond: (email_list_id = 13)
Total runtime: 2,122,693.864 ms


2. EXPLAIN ANALYZE SELECT email_record_id FROM ONLY email_record er JOIN 
suppress s USING

(email) WHERE er.email_list_id = 13;

Hash Join  (cost=3945127.49..5000543.11 rows=800912 width=8) (actual 
time=808874.088..808874.088 rows=0 loops=1)

  Hash Cond: (("outer".email)::text = ("inner".email)::text)
  ->  Seq Scan on suppress s  (cost=0.00..393024.60 rows=21646260 
width=25) (actual time=661.518..216933.399 rows=21646261 loops=1)
  ->  Hash  (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual 
time=494294.932..494294.932 rows=3245336 loops=1)
->  Bitmap Heap Scan on email_record er 
(cost=38464.83..3899868.47 rows=4606808 width=32) (actual 
time=242198.226..485942.542 rows=3245336 loops=1)

  Recheck Cond: (email_list_id = 13)
  ->  Bitmap Index Scan on list  (cost=0.00..38464.83 
rows=4606808 width=0) (actual time=241769.786..241769.786 rows=5172979 
loops=1)

Index Cond: (email_list_id = 13)
Total runtime: 808,884.387 ms



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Scrub one large table against another (vmstat output)

2006-10-11 Thread Brendan Curran


What prevents you from using an aggregate function?



I guess I could actually obtain the results in an aggregate function and use those to maintain a 
summary table. There is a web view that requires 'as accurate as possible' numbers to be queried per 
group (all 40 groups are displayed on the same page) and so constant aggregates over the entire 
table would be a nightmare.


Probably not 2x, but better performance than now. You probably don't 
want RAID 1, depending on your setup, many list member swear by RAID 10. 
Of course, your setup will depend on how much money you have to burn. 
That said, RAID 1 testing will allow you to determine the upper bounds 
of your hardware. Some folks say they get better performance with WAL 
off the main RAID, some keep it on. Only testing will allow you to
determine what is optimal. 


I will have to try moving WAL off those raid spindles, I have seen the posts 
regarding this.

In the meantime, you need to identify the 
bottleneck of your operation. You should collect vmstat and iostat 
statistics for your present setup. Good luck!




I have to confess that I am a bit of a novice with vmstat. Below is a sample of my vmstat output 
while running two scrubbing queries simultaneously:


machine:/dir# vmstat -S M 2
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  1  4117 15   296200   10025   96  107  2  0 86 11
 0  3  4117 15   296200  4884  1860  415  841 18  1 52 29
 1  1  4115 15   296400  2246  1222  462  394  8  0 51 41
 0  2  4114 14   296700  3932  2238  485  613 12  0 62 25
 1  1  4115 13   296600  3004  1684  507  609  8  0 60 31
 0  3  4116 13   296500  4688  4000  531  613 15  1 52 33
 1  1  4117 13   296400  2890   268  433  441  9  1 58 32
 0  1  4114 13   296800  2802  4708  650  501  8  1 64 28
 0  2  4114 13   296800  4850  1696  490  574 15  1 57 27
 0  2  4116 13   296600  4300  3062  540  520 13  1 61 26
 0  2  4115 13   296600  3292  3608  549  455 10  1 65 24
 0  3  4115 13   296600  4856  2098  505  564 15  1 59 26
 0  3  4115 13   296600  1608  2314  447  413  4  0 63 33
 0  3  4116 13   296600  6206  1664  442  649 18  1 52 29
 1  1  4115 13   296600  1886  1262  464  412  5  0 60 35
 0  3  4118 13   296400  2510  4138  571  493  7  1 64 28
 1  1  4117 13   296400  163256  325  373  5  0 53 42
 0  3  4116 13   296500  5358  3510  504  649 14  1 59 26
 1  1  4118 13   296400  2814   920  447  403  8  0 63 29

I know that wa is the time spent waiting on IO, but I lack a benchmark to determine just what I 
should expect from my hardware (three 146GB U320 SCSI 10k drives in raid 5 on a Dell PERC4ei PE2850 
controller). Those drives are dedicated completely to a /data mount that contains only 
/data/postgresql/8.1/main. I have another two drives in raid 1 for everything else (OS, apps, etc.). 
Can you give me any pointers based on that vmstat output?


Regards and Thanks,
Brendan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Collect stats during seqscan (was: [PERFORM] Simple join optimized badly?)

2006-10-11 Thread Jim C. Nasby
On Wed, Oct 11, 2006 at 10:27:26AM -0400, Bucky Jordan wrote:
> Also, I'm guessing this has already come up at some point, but what
> about allowing PG to do some stat collection during queries? If you're
> touching a lot of data (such as an import process) wouldn't it be more
> efficient (and perhaps more accurate) to collect stats then, rather than
> having to re-scan? It would be nice to be able to turn this on/off on a
> per query basis, seeing as it could have pretty negative impacts on OLTP
> performance...

I suspect that could be highly useful in data warehouse environments
where you're more likely to have to sequential scan a table. It would be
interesting to have it so that a sequential scan that will run to
completion also collects stats along the way.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Scrub one large table against another

2006-10-11 Thread Jim C. Nasby
On Wed, Oct 11, 2006 at 10:53:41AM -0600, Brendan Curran wrote:
> Interestingly, and thank you to Tom and Jim, the explicit JOIN improved 
> performance tremendously (RESULTS BELOW). I converted the entire query 
> to use explicit joins instead of IN and EXISTS and discovered acceptable 
> performance. I think the next place to go from here is RAID1/RAID10 and 
> possibly partitioning my large table (Welcome to DDL insanity, right?).
 
Remember that partitioning is not a magic bullet: it only helps in cases
where you need to keep a lot of data, but normally only access a small
portion of it.

WAL on RAID5 without a really good controller will probably kill you.
Data being there isn't too much better. You'll probably be better with
either 1 raid 10 or 2 raid 1s.

> I have to add that I'm a little surprised the documentation is so 
> generous to IN and EXISTS. Is there something amiss in my configuration 
> that prevents them from performing correctly? If not, I can't imagine a 
> time when IN or EXISTS would be more performant than an explicit JOIN...
 
Well, IN != EXISTS != JOIN. Exists just stops as soon as it finds a
record. For some cases, it's equivalent to IN, but not all. IN has to
de-duplicate it's list in some fashion. For small IN lists, you can do
this with an OR, but at some point you need to switch to an actual
unique (actually, I suspect the difference in PostgreSQL just depends on
if you passed values into IN or a subquery). A join on the other hand
doesn't worry about duplicates at all. There may be some brains in the
planner that realize if a subquery will return a unique set (ie: you're
querying on a primary key).

> Additionally, I manually scrub for duplicates at the group level in the 
> email_record table to keep my records unique. I would like to use a 
> unique constraint, but have found that batching in JDBC is impossible 
> due to irrecoverable errors even when using BEFORE INSERT triggers to 
> just return NULL if a record exists already. Has anyone got an elegant 
> solution for the 'add only if not exists already' problem similar to 
> MSSQL's MERGE command?
 
Your best bet (until we have something akin to MERGE, hopefully in 8.3)
is to load the data into a TEMP table and de-dupe it from there.
Depending on what you're doing you might want to delete it, or update an
ID column in the temp table. Note that assumes that only one process is
loading data at any time, if that's not the case you have to get
trickier.

> Just one more thing... I have found that maintaining a btree index on a 
> varchar(255) value is extremely expensive on insert/update/delete. It is 
> unfortunately necessary for me to maintain this index for queries and 
> reports so I am transitioning to using an unindexed staging table to 
> import data into before merging it with the larger table. All the docs 
> and posts recommend is to drop the index, import your data, and then 
> create the index again. This is untenable on a daily / bi-weekly basis. 
> Is there a more elegant solution to this indexing problem?

You might be happier with tsearch than a regular index.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Scrub one large table against another (vmstat output)

2006-10-11 Thread Markus Schaber
Hi, Brendan,

Brendan Curran wrote:
>> What prevents you from using an aggregate function?
> 
> I guess I could actually obtain the results in an aggregate function and
> use those to maintain a summary table. There is a web view that requires
> 'as accurate as possible' numbers to be queried per group (all 40 groups
> are displayed on the same page) and so constant aggregates over the
> entire table would be a nightmare.

That sounds just like a case for GROUP BY and a materialized view.

Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org