Re: [PERFORM] query produces 1 GB temp file

2005-02-06 Thread John A Meinel
Dirk Lutzebaeck wrote:
Greg,
Thanks for your analysis. But I dont get any better after bumping 
STATISTICS target from 10 to 200.
explain analyze shows that the optimizer is still way off estimating 
the rows. Is this normal? It still produces a 1 GB temp file.
I simplified the query a bit, now only two tables are involved (bi, 
df). I also vacuumed.

Are you just doing VACUUM? Or are you doing VACUUM ANALYZE? You might 
also try VACUUM ANALYZE FULL (in the case that you have too many dead 
tuples in the table).

VACUUM cleans up, but doesn't adjust any planner statistics without ANALYZE.
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] horizontal partition

2005-02-06 Thread Gaetano Mendola
Tom Lane wrote:
Josh Berkus  writes:
The issue here is that the planner is capable of "pushing down" the WHERE 
criteria into the first view, but not into the second, "nested" view, and so 
postgres materializes the UNIONed data set before perfoming the join.

Thing is, I seem to recall that this particular issue was something Tom fixed
a while ago.  Which is why I wanted to know what version Gaetano is using.

It's still true that we can't generate a nestloop-with-inner-indexscan
join plan if the inner side is anything more complex than a single table
scan.  Since that's the only plan that gives you any chance of not
scanning the whole partitioned table, it's rather a hindrance :-(
It might be possible to fix this by treating the nestloop's join
conditions as "push down-able" criteria, instead of the present rather
ad hoc method for generating nestloop/indexscan plans.  It'd be quite
a deal of work though, and I'm concerned about how slow the planner
might run if we did do it like that.
I don't know if this will help my attempt to perform an horizontal
partition, if it do I think that it can solve lot of problems out there,
I tried the inheritance technique too:
The table user_logs is the original one, I created two tables extending this 
one:
CREATE TABLE user_logs_2003_h () inherits (user_logs);
CREATE TABLE user_logs_2002_h () inherits (user_logs);
I defined on this table the index already defined on user_logs.
And this is the result:
empdb=# explain analyze select * from user_logs where id_user = 
sp_id_user('kalman');
   
QUERY PLAN
-
 Result  (cost=0.00..426.33 rows=335 width=67) (actual time=20.891..129.218 
rows=98 loops=1)
   ->  Append  (cost=0.00..426.33 rows=335 width=67) (actual 
time=20.871..128.643 rows=98 loops=1)
 ->  Index Scan using idx_user_user_logs on user_logs  
(cost=0.00..133.11 rows=66 width=67) (actual time=20.864..44.594 rows=3 loops=1)
   Index Cond: (id_user = 4185)
 ->  Index Scan using idx_user_user_logs_2003_h on user_logs_2003_h 
user_logs  (cost=0.00..204.39 rows=189 width=67) (actual time=1.507..83.662 
rows=95 loops=1)
   Index Cond: (id_user = 4185)
 ->  Index Scan using idx_user_user_logs_2002_h on user_logs_2002_h 
user_logs  (cost=0.00..88.83 rows=80 width=67) (actual time=0.206..0.206 rows=0 
loops=1)
   Index Cond: (id_user = 4185)
 Total runtime: 129.500 ms
(9 rows)
that is good, but now look what happen in a view like this one:
create view to_delete AS
SELECT v.login,
   u.*
from  user_login v,
  user_logs u
where v.id_user = u.id_user;

empdb=# explain analyze select * from to_delete where login = 'kalman';
   QUERY PLAN

 Hash Join  (cost=4.01..65421.05 rows=143 width=79) (actual 
time=1479.738..37121.511 rows=98 loops=1)
   Hash Cond: ("outer".id_user = "inner".id_user)
   ->  Append  (cost=0.00..50793.17 rows=2924633 width=67) (actual 
time=21.391..33987.363 rows=2927428 loops=1)
 ->  Seq Scan on user_logs u  (cost=0.00..7195.22 rows=411244 width=67) 
(actual time=21.385..5641.307 rows=414039 loops=1)
 ->  Seq Scan on user_logs_2003_h u  (cost=0.00..34833.95 rows=2008190 
width=67) (actual time=0.024..18031.218 rows=2008190 loops=1)
 ->  Seq Scan on user_logs_2002_h u  (cost=0.00..8764.00 rows=505199 
width=67) (actual time=0.005..5733.554 rows=505199 loops=1)
   ->  Hash  (cost=4.00..4.00 rows=2 width=16) (actual time=0.195..0.195 rows=0 
loops=1)
 ->  Index Scan using user_login_login_key on user_login v  
(cost=0.00..4.00 rows=2 width=16) (actual time=0.155..0.161 rows=1 loops=1)
   Index Cond: ((login)::text = 'kalman'::text)
 Total runtime: 37122.069 ms
(10 rows)

and how you can see this path is not applicable too :-(
Any other suggestion ?

Regards
Gaetano Mendola



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] query produces 1 GB temp file

2005-02-06 Thread Greg Stark

I gave a bunch of "explain analyze select" commands to test estimates for
individual columns. What results do they come up with? If those are inaccurate
then raising the statistics target is a good route. If those are accurate
individually but the combination is inaccurate then you have a more difficult
problem.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-02-06 Thread Jan Wieck
On 1/20/2005 9:23 AM, Jean-Max Reymond wrote:
On Thu, 20 Jan 2005 15:03:31 +0100, Hervé Piedvache <[EMAIL PROTECTED]> wrote:
We were at this moment thinking about a Cluster solution ... We saw on the
Internet many solution talking about Cluster solution using MySQL ... but
nothing about PostgreSQL ... the idea is to use several servers to make a
sort of big virtual server using the disk space of each server as one, and
having the ability to use the CPU and RAM of each servers in order to
maintain good service performance ...one can imagin it is like a GFS but
dedicated to postgreSQL...
forget mysql cluster for now.
Sorry for the late reply.
I'd second that. I was just on the Solutions Linux in Paris and spoke 
with MySQL people.

There were some questions I had around the new NDB cluster tables and I 
stopped by at their booth. My question if there are any plans to add 
foreign key support to NDB cluster tables got answered with "it will 
definitely be in the next version, which is the one containing NDB 
cluster, so yes, it will support foreign key from the start".

Back home I found some more time to investigate and found this forum 
article http://lists.mysql.com/cluster/1442 posted by a MySQL AB senior 
software architect, where he says exactly the opposite.

I don't know about your application, but trust me that maintaining 
proper referential integrity on the application level against a 
multimaster clustered database isn't that easy. So this is in fact a 
very important question.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] query produces 1 GB temp file

2005-02-06 Thread John A Meinel
Dirk Lutzebaeck wrote:
Greg Stark wrote:
I gave a bunch of "explain analyze select" commands to test estimates 
for
individual columns. What results do they come up with? If those are 
inaccurate
then raising the statistics target is a good route. If those are 
accurate
individually but the combination is inaccurate then you have a more 
difficult
problem.

 

After  setting the new statistics target to 200 they did slightly 
better but not accurate. The results were attached to my last post. 
Here is a copy:


It does seem that setting the statistics to a higher value would help. 
Since rc=130170467 seems to account for almost 1/3 of the data. Probably 
you have other values that are much less common. So setting a high 
statistics target would help the planner realize that this value occurs 
at a different frequency from the other ones. Can you try other numbers 
and see what the counts are?

I assume you did do a vacuum analyze after adjusting the statistics target.
Also interesting that in the time it took you to place these queries, 
you had received 26 new rows.

And finally, what is the row count if you do
explain analyze select * from bi where rc=130170467::oid and 
co=117305223::oid;

If this is a lot less than say 500k, then probably you aren't going to 
be helped a lot. The postgresql statistics engine doesn't generate cross 
column statistics. It always assumes random distribution of data. So if 
two columns are correlated (or anti-correlated), it won't realize that.

Even so, your original desire was to reduce the size of the intermediate 
step (where you have 700k rows). So you need to try and design a 
subselect on bi which is as restrictive as possible, so that you don't 
get all of these rows. With any luck, the planner will realize ahead of 
time that there won't be that many rows, and can use indexes, etc. But 
even if it doesn't use an index scan, if you have a query that doesn't 
use a lot of rows, then you won't need a lot of disk space.

John
=:->
explain analyze select * from bi where rc=130170467;
QUERY PLAN
--- 

Seq Scan on bi (cost=0.00..41078.76 rows=190960 width=53) (actual 
time=0.157..3066.028 rows=513724 loops=1)
Filter: (rc = 130170467::oid)
Total runtime: 4208.663 ms
(3 rows)

explain analyze select * from bi where co=117305223;
QUERY PLAN
--- 

Seq Scan on bi (cost=0.00..41078.76 rows=603988 width=53) (actual 
time=0.021..3692.238 rows=945487 loops=1)
Filter: (co = 117305223::oid)
Total runtime: 5786.268 ms
(3 rows)

Here is the distribution of the data in bi:
select count(*) from bi;
1841966
select count(*) from bi where rc=130170467::oid;
513732
select count(*) from bi where co=117305223::oid;
945503




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] query produces 1 GB temp file

2005-02-06 Thread Tom Lane
[EMAIL PROTECTED] (Dirk Lutzebaeck) writes:
> SELECT DISTINCT ON (df.val_9, df.created, df.flatid) df.docindex, 
> df.flatobj, bi.oid, bi.en
> FROM bi,df
> WHERE bi.rc=130170467
> ...
> ORDER BY df.val_9 ASC, df.created DESC
> LIMIT 1000 OFFSET 0

Just out of curiosity, what is this query supposed to *do* exactly?
It looks to me like it will give indeterminate results.  Practical
uses of DISTINCT ON generally specify more ORDER BY columns than
there are DISTINCT ON columns, because the extra columns determine
which rows have priority to survive the DISTINCT filter.  With the
above query, you have absolutely no idea which row will be output
for a given combination of val_9/created/flatid.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Steven Rosenstein




This is probably a very trivial question and I feel foolish in even posting
it, but I cannot seem to get it to work.

SCENARIO (abstracted):

Two tables, "summary" and "detail".  The schema of summary looks like:

id  int   serial sequential record id
collect_date  date  date the detail events were collected

The schema of detail looks like:

id  int   serial sequential record id
sum_idint   the id of the parent record in the summary table
details   text  a particular event's details

The relationship is obvious.  If I want to extract all the detail records
for a particular date (2/5/05), I construct a query as follows:

SELECT * FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
collect_date='2005-02-05';

Now... I want to *delete* all the detail records for a particular date, I
tried:

DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
collect_date='2005-02-05';

But I keep getting a parser error.  Am I not allowed to use JOINs in a
DELETE statement, or am I just fat-fingering the SQL text somewhere.  If
I'm *not* allowed to use a JOIN with a DELETE, what is the best workaround?
I want to delete just the records in the detail table, and not its parent
summary record.

Thanks in advance for your help,
--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Gaetano Mendola
Steven Rosenstein wrote:
DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
collect_date='2005-02-05';
DELETE FROM detail WHERE detail.sum_id in ( select id from summary  )
AND collect_date='2005-02-05';
Regards
Gaetano Mendola



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Michael Fuhr
On Sun, Feb 06, 2005 at 12:16:13PM -0500, Steven Rosenstein wrote:
>
> DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
> collect_date='2005-02-05';
> 
> But I keep getting a parser error.  Am I not allowed to use JOINs in a
> DELETE statement, or am I just fat-fingering the SQL text somewhere.

See the documentation for DELETE:

http://www.postgresql.org/docs/8.0/static/sql-delete.html

If you intend to delete the date's record from the summary table,
then the detail table could use a foreign key constraint defined
with ON DELETE CASCADE.  Deleting a record from summary would then
automatically delete all associated records in detail.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread John Arbash Meinel
Gaetano Mendola wrote:
Steven Rosenstein wrote:
DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
collect_date='2005-02-05';

You have to tell it what table you are deleting from. Select * from A
join B is both tables. What you want to do is fix the where clause.
DELETE FROM detail WHERE detail.sum_id in ( select id from summary  )
AND collect_date='2005-02-05';
I'm guessing this should actually be
DELETE FROM detail WHERE detail.sum_id in ( SELECT id FROM summary WHERE
collect_date='2005-02-05' );
Otherwise you wouldn't really need the join.
You have to come up with a plan that yields rows that are in the table
you want to delete. The rows that result from
select * from detail join summary, contain values from both tables.
If you want to delete from both tables, I think this has to be 2
deletes. Probably best to be in a transaction.
BEGIN;
DELETE FROM detail WHERE ...
DELETE FROM summary WHERE collect_date = '2005-02-05';
COMMIT;
Regards
Gaetano Mendola
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Steven Rosenstein




Hi Michael,

Thank you for the link to the documentation page.  I forgot to mention that
we're still using version 7.3.  When I checked the 7.3 documentation for
DELETE, there was no mention of being able to use fields from different
tables in a WHERE clause.  This feature must have been added in a
subsequent release of PostgreSQL.

Gaetano & John:  I *did* try your suggestion.  However, there were so many
summary ID's returned (9810 to be exact) that the DELETE seemed to be
taking forever.  Here's an actual SELECT query that I ran as a test:

vsa=# vacuum analyze verbose vsa.tbl_win_patch_scan;   [This is the
"summary" table from my abstracted example]
INFO:  --Relation vsa.tbl_win_patch_scan--
INFO:  Pages 374: Changed 0, Empty 0; Tup 10485: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO:  --Relation pg_toast.pg_toast_39384--
INFO:  Pages 62679: Changed 0, Empty 0; Tup 254116: Vac 0, Keep 0, UnUsed
0.
Total CPU 0.86s/0.21u sec elapsed 13.79 sec.
INFO:  Analyzing vsa.tbl_win_patch_scan
VACUUM
Time: 18451.32 ms

vsa=# vacuum analyze verbose vsa.tbl_win_patch_scan_item;   [This is the
"detail" table from my abstracted example]
INFO:  --Relation vsa.tbl_win_patch_scan_item--
INFO:  Pages 110455: Changed 0, Empty 0; Tup 752066: Vac 0, Keep 0, UnUsed
0.
Total CPU 2.23s/0.45u sec elapsed 42.07 sec.
INFO:  --Relation pg_toast.pg_toast_39393--
INFO:  Pages 2464: Changed 0, Empty 0; Tup 14780: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.02s/0.02u sec elapsed 2.31 sec.
INFO:  Analyzing vsa.tbl_win_patch_scan_item
VACUUM
Time: 62075.52 ms

vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-01 00:00:00');
 QUERY PLAN
-
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..379976970.68 rows=376033
width=1150) (actual time=11.50..27373.29 rows=62 loops=1)
   Filter: (subplan)
   SubPlan
 ->  Materialize  (cost=505.06..505.11 rows=4 width=4) (actual
time=0.00..0.00 rows=2 loops=752066)
   ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=4
width=4) (actual time=0.03..11.16 rows=2 loops=1)
 Filter: (scan_datetime < '2004-09-01 00:00:00'::timestamp
without time zone)
 Total runtime: 27373.65 msec
(7 rows)

Time: 27384.12 ms

I put in a very early date (2004-09-01) because I knew there would be very
few rows to look at (2 rows in vsa.tbl_win_patch_scan meet the date
criteria, and a total of 62 rows in vsa.tbl_win_patch_scan_item match
either of the two tbl_win_patch_scan ID's returned in the WHERE subquery).
Can anyone see a way of optimizing this so that it runs faster?  The real
date I should be using is 2004-12-06 (~60 days retention), and when I do
use it, the query seems to take forever.  I ran a number explan analyzes
with different scan_datetimes, and it seems that the execution time
increases exponentially with the number of rows (ID's) returned by the
subquery.  Running top shows that the EXPLAIN is entirely CPU-bound.  There
is no disk I/O during any query execution:

DATE=2004-09-01; SUMMARY ROWS=2; DETAIL ROWS=62; TIME=27.37 sec (Included
initial query cache loading effect)
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-01 00:00:00');
 QUERY PLAN
-
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..379976970.68 rows=376033
width=1150) (actual time=11.50..27373.29 rows=62 loops=1)
   Filter: (subplan)
   SubPlan
 ->  Materialize  (cost=505.06..505.11 rows=4 width=4) (actual
time=0.00..0.00 rows=2 loops=752066)
   ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=4
width=4) (actual time=0.03..11.16 rows=2 loops=1)
 Filter: (scan_datetime < '2004-09-01 00:00:00'::timestamp
without time zone)
 Total runtime: 27373.65 msec
(7 rows)

Time: 27384.12 ms

DATE=2004-09-02; SUMMARY ROWS=2; DETAIL ROWS=62; TIME=8.26 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-02 00:00:00');
 QUERY PLAN

 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..380115740.80 rows=376033
width=1142) (actual time=10.42..8259.79 rows=62 loops=1)
   Filter: (subplan)
   SubPlan
 ->  Materialize  (cost=505.06..505.48 rows=

Re: [PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Tom Lane
Steven Rosenstein <[EMAIL PROTECTED]> writes:
> Thank you for the link to the documentation page.  I forgot to mention that
> we're still using version 7.3.  When I checked the 7.3 documentation for
> DELETE, there was no mention of being able to use fields from different
> tables in a WHERE clause.  This feature must have been added in a
> subsequent release of PostgreSQL.

No, it's been there all along, if perhaps not well documented.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Steven Rosenstein




While working on a previous question I posed to this group, I ran a number
of EXPLAIN ANALYZE's to provide as examples.  After sending up my last
email, I ran the same query *without* EXPLAIN ANALYZE.  The runtimes were
vastly different.  In the following example, I ran two identical queries
one right after the other.  The runtimes for both was very close (44.77
sec).  I then immediately ran the exact same query, but without EXPLAIN
ANALYZE.  The same number of rows was returned, but the runtime was only
8.7 sec.  I don't think EXPLAIN ANALYZE puts that much overhead on a query.
Does anyone have any idea what is going on here?

--- Steve

vsa=# explain analyze SELECT id,win_patch_scan_id FROM
vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM
vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-18 00:00:00');
 QUERY PLAN

 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..382335670.62 rows=376033
width=8) (actual time=10.18..44773.22 rows=2045 loops=1)
   Filter: (subplan)
   SubPlan
 ->  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual
time=0.00..0.02 rows=43 loops=752066)
   ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=632
width=4) (actual time=0.02..10.09 rows=43 loops=1)
 Filter: (scan_datetime < '2004-09-18 00:00:00'::timestamp
without time zone)
 Total runtime: 44774.49 msec
(7 rows)

Time: 44775.62 ms


vsa=# explain analyze SELECT id,win_patch_scan_id FROM
vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM
vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-18 00:00:00');
 QUERY PLAN

 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..382335670.62 rows=376033
width=8) (actual time=10.18..44765.36 rows=2045 loops=1)
   Filter: (subplan)
   SubPlan
 ->  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual
time=0.00..0.02 rows=43 loops=752066)
   ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=632
width=4) (actual time=0.02..10.10 rows=43 loops=1)
 Filter: (scan_datetime < '2004-09-18 00:00:00'::timestamp
without time zone)
 Total runtime: 44766.62 msec
(7 rows)

Time: 44767.71 ms


vsa=# SELECT id,win_patch_scan_id FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-18 00:00:00');
   id   | win_patch_scan_id
+---
  1 | 1
  2 | 1
  3 | 1
  4 | 1
  5 | 1
--8< SNIP --
211 | 7
212 | 7
213 | 7
214 | 7
215 | 7
216 | 7
217 | 7
 692344 |  9276
 692345 |  9276
 692346 |  9276
 692347 |  9276
 692348 |  9276
--8< SNIP --
 694167 |  9311
 694168 |  9311
 694169 |  9311
 694170 |  9311
 694171 |  9311
(2045 rows)

Time: 8703.56 ms
vsa=#
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Steven Rosenstein




Many thanks to Gaetano Mendola and Tom Lane for the hints about using
fields from other tables in a DELETE's WHERE clause.  That was the magic
bullet I needed, and my application is working as expected.

--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


   
 Tom Lane  
 <[EMAIL PROTECTED] 
 s> To 
 Sent by:  Steven Rosenstein/New   
 pgsql-performance York/[EMAIL PROTECTED]   
   
 [EMAIL PROTECTED]  cc 
 .org  pgsql-performance@postgresql.org
   Subject 
   Re: [PERFORM] Are JOINs allowed 
 02/06/2005 02:49  with DELETE FROM
 PM
   
   
   
   
   




Steven Rosenstein <[EMAIL PROTECTED]> writes:
> Thank you for the link to the documentation page.  I forgot to mention
that
> we're still using version 7.3.  When I checked the 7.3 documentation for
> DELETE, there was no mention of being able to use fields from different
> tables in a WHERE clause.  This feature must have been added in a
> subsequent release of PostgreSQL.

No, it's been there all along, if perhaps not well documented.

 regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Leeuw van der, Tim


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steven Rosenstein
Sent: Sunday, February 06, 2005 8:51 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?






While working on a previous question I posed to this group, I ran a number
of EXPLAIN ANALYZE's to provide as examples.  After sending up my last
email, I ran the same query *without* EXPLAIN ANALYZE.  The runtimes were
vastly different.  In the following example, I ran two identical queries
one right after the other.  The runtimes for both was very close (44.77
sec).  I then immediately ran the exact same query, but without EXPLAIN
ANALYZE.  The same number of rows was returned, but the runtime was only
8.7 sec.  I don't think EXPLAIN ANALYZE puts that much overhead on a query.
Does anyone have any idea what is going on here?

--- Steve


Caching by the OS?

(Did you try to *first* run the query w/o EXPLAIN ANALYZE, and then with? 
What's the timing if you do that?)

--Tim

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Greg Stark

"Leeuw van der, Tim" <[EMAIL PROTECTED]> writes:

> I don't think EXPLAIN ANALYZE puts that much overhead on a query.

EXPLAIN ANALYZE does indeed impose a significant overhead. What percentage of
the time is overhead depends heavily on how much i/o the query is doing.

For queries that are primarily cpu bound because they're processing data from
the cache it can be substantial. If all the data is in the shared buffers then
the gettimeofday calls for explain analyze can be just about the only syscalls
being executed and they're executed a lot.

It would be interesting to try to subtract out the profiling overhead from the
data like most profilers do. But it's not an easy thing to do since the times
are nested.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Tom Lane
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steven Rosenstein
> >> I don't think EXPLAIN ANALYZE puts that much overhead on a query.

I think you're being overly optimistic.  The explain shows that the
Materialize subnode is being entered upwards of 32 million times:

   ->  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual 
time=0.00..0.02 rows=43 loops=752066)

43 * 752066 = 32338838.  The instrumentation overhead is basically two
gettimeofday() kernel calls per node entry.  Doing the math shows that
your machine is able to do gettimeofday() in about half a microsecond,
which isn't stellar but it's not all that slow for a kernel call.
(What's the platform here, anyway?)  Nonetheless it's a couple of times
larger than the actual time needed to pull a row from a materialized
array ...

The real answer to your question is "IN (subselect) sucks before PG 7.4;
get a newer release".

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-02-06 Thread Jan Wieck
On 1/28/2005 2:49 PM, Christopher Browne wrote:
But there's nothing wrong with the idea of using "pg_dump --data-only"
against a subscriber node to get you the data without putting a load
on the origin.  And then pulling the schema from the origin, which
oughtn't be terribly expensive there.
And there is a script in the current CVS head that extracts the schema 
from the origin in a clean, slony-traces-removed state.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Steven Rosenstein




You're probably right about my being overly optimistic about the load
imposed by EXPLAIN ANALYZE.  It was just that in my previous experience
with it, I'd never seen such a large runtime discrepancy before.  I even
allowed for a "caching effect" by making sure the server was all but
quiescent, and then running the three queries as quickly after one another
as I could.

The server itself is an IBM x345 with dual Xeon 3ghz CPU's (hyperthreading
turned off) and 2.5gb of RAM.  O/S is RHEL3 Update 4.  Disks are a
ServeRAID of some flavor, I'm not sure what.

Thanks for the heads-up about the performance of IN in 7.3.  We're looking
to migrate to 8.0 or 8.0.1 when they become GA, but some of our databases
are in excess of 200gb-300gb, and we need to make sure we have a good
migration plan in place (space to store the dump out of the 7.3 db) before
we start.
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


   
 Tom Lane  
 <[EMAIL PROTECTED] 
 s> To 
   Steven Rosenstein/New   
 02/06/2005 05:46  York/[EMAIL PROTECTED]   
   
 PM cc 
   pgsql-performance@postgresql.org
   Subject 
   Re: [PERFORM] Can the V7.3 EXPLAIN  
   ANALYZE be trusted? 
   
   
   
   
   
   




> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Steven
Rosenstein
> >> I don't think EXPLAIN ANALYZE puts that much overhead on a query.

I think you're being overly optimistic.  The explain shows that the
Materialize subnode is being entered upwards of 32 million times:

   ->  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual
time=0.00..0.02 rows=43 loops=752066)

43 * 752066 = 32338838.  The instrumentation overhead is basically two
gettimeofday() kernel calls per node entry.  Doing the math shows that
your machine is able to do gettimeofday() in about half a microsecond,
which isn't stellar but it's not all that slow for a kernel call.
(What's the platform here, anyway?)  Nonetheless it's a couple of times
larger than the actual time needed to pull a row from a materialized
array ...

The real answer to your question is "IN (subselect) sucks before PG 7.4;
get a newer release".

 regards, tom lane



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org