[PERFORM] FK triggers misused?

2007-04-14 Thread cluster

I have performance problem with the following simple update query:

  UPDATE posts
  SET num_views = num_views + 1
  WHERE post_id IN (2526,5254,2572,4671,25);

The table "posts" is a large table with a number of foreign keys (FK).

It seems that the FK triggers for the table are evaluated even though
none of the FK columns are altered. In fact, these FK triggers seems to
constitute a considerable part of the total execution time. See the
below EXPLAIN ANALYZE.

Why are these FK triggers evaluated at all and why do they take so much
time?

--
=> EXPLAIN ANALYZE update posts set num_views = num_views + 1 where
post_id in (2526,5254,2572,4671,25);
   QUERY PLAN
-
 Bitmap Heap Scan on posts  (cost=10.02..29.81 rows=5 width=1230)
(actual time=0.146..0.253 rows=5 loops=1)
   Recheck Cond: ((post_id = 2526) OR (post_id = 5254) OR (post_id =
2572) OR (post_id = 4671) OR (post_id = 25))
   ->  BitmapOr  (cost=10.02..10.02 rows=5 width=0) (actual
time=0.105..0.105 rows=0 loops=1)
 ->  Bitmap Index Scan on posts_pkey  (cost=0.00..2.00 rows=1
width=0) (actual time=0.053..0.053 rows=2 loops=1)
   Index Cond: (post_id = 2526)
 ->  Bitmap Index Scan on posts_pkey  (cost=0.00..2.00 rows=1
width=0) (actual time=0.012..0.012 rows=2 loops=1)
   Index Cond: (post_id = 5254)
 ->  Bitmap Index Scan on posts_pkey  (cost=0.00..2.00 rows=1
width=0) (actual time=0.008..0.008 rows=2 loops=1)
   Index Cond: (post_id = 2572)
 ->  Bitmap Index Scan on posts_pkey  (cost=0.00..2.00 rows=1
width=0) (actual time=0.010..0.010 rows=2 loops=1)
   Index Cond: (post_id = 4671)
 ->  Bitmap Index Scan on posts_pkey  (cost=0.00..2.00 rows=1
width=0) (actual time=0.011..0.011 rows=2 loops=1)
   Index Cond: (post_id = 25)
 Trigger for constraint posts_question_id_fkey: time=50.031 calls=5
 Trigger for constraint posts_author_id_fkey: time=22.330 calls=5
 Trigger for constraint posts_language_id_fkey: time=1.282 calls=5
 Trigger posts_tsvectorupdate: time=61.659 calls=5
 Total runtime: 174.230 ms
(18 rows)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] FK triggers misused?

2007-04-16 Thread cluster

So the next question is, what pg version is the original poster using?
because 8.1.x doesn't report trigger execution times, and 8.2.x would use
a single bitmap index scan with an = ANY condition, not a BitmapOr.


I have tried 8.1.0 and 8.1.3 for this query.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] FK triggers misused?

2007-04-17 Thread cluster

> Do the rows being updated contain

NULLs in the foreign-key columns?


No, all FK columns are non-NULL. It is very strange.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] FK triggers misused?

2007-04-21 Thread cluster

I have investigated a bit now and found the following:

When I perform the update the *first* time, the triggers are actually 
not evaluated. But from the second update they are.


Also notice that the number of rows changes. Shouldn't that number of 
rows always be 2 as question_id is primary key?


Example:

=> explain analyze update questions set cancelled_time = now() where 
question_id in (10,11);

 QUERY PLAN
-
 Bitmap Heap Scan on questions  (cost=4.01..12.04 rows=2 width=112) 
(actual time=0.193..0.205 rows=2 loops=1)

   Recheck Cond: ((question_id = 10) OR (question_id = 11))
   ->  BitmapOr  (cost=4.01..4.01 rows=2 width=0) (actual 
time=0.046..0.046 rows=0 loops=1)
 ->  Bitmap Index Scan on questions_pkey  (cost=0.00..2.00 
rows=1 width=0) (actual time=0.037..0.037 rows=1 loops=1)

   Index Cond: (question_id = 10)
 ->  Bitmap Index Scan on questions_pkey  (cost=0.00..2.00 
rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)

   Index Cond: (question_id = 11)
 Trigger for constraint questions_repost_of_fkey: time=0.023 calls=2
 Total runtime: 0.734 ms
(9 rows)



=> explain analyze update questions set cancelled_time = now() where 
question_id in (10,11);

 QUERY PLAN
-
 Bitmap Heap Scan on questions  (cost=4.01..12.04 rows=2 width=112) 
(actual time=0.085..0.097 rows=2 loops=1)

   Recheck Cond: ((question_id = 10) OR (question_id = 11))
   ->  BitmapOr  (cost=4.01..4.01 rows=2 width=0) (actual 
time=0.047..0.047 rows=0 loops=1)
 ->  Bitmap Index Scan on questions_pkey  (cost=0.00..2.00 
rows=1 width=0) (actual time=0.036..0.036 rows=2 loops=1)

   Index Cond: (question_id = 10)
 ->  Bitmap Index Scan on questions_pkey  (cost=0.00..2.00 
rows=1 width=0) (actual time=0.007..0.007 rows=2 loops=1)

   Index Cond: (question_id = 11)
 Trigger for constraint questions_repost_of_fkey: time=0.025 calls=2
 Trigger for constraint questions_author_id_fkey: time=0.167 calls=2
 Trigger for constraint questions_category_id_fkey: time=0.196 calls=2
 Trigger for constraint questions_lock_user_id_fkey: time=0.116 calls=2
 Total runtime: 1.023 ms
(12 rows)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Two fast queries get slow when combined

2007-10-30 Thread cluster
I have two small queries which are both very fast to evaluate 
separately. The first query, "Query 1", calculates some statistics and 
the the second query, "Query 2", finds a subset of relevant keys.
When combined into a single query which calculates statistics from only 
the subset of relevant keys the evaluation plan explodes and uses both 
seq scans and bitmap heap scans.

How can I improve the performance of the combined query?

Queries and output from EXPLAIN ANALYZE can be seen here with some 
syntax highlighting:

   http://rafb.net/p/BJIW4p69.html

I will also paste it here:

=
QUERY 1 (very *fast*):
=
SELECT keyId, count(1) as num_matches
FROM stats
GROUP BY keyId
LIMIT 50

 Limit  (cost=0.00..23.65 rows=50 width=8) (actual time=0.090..2.312 
rows=50 loops=1)
   ->  GroupAggregate  (cost=0.00..4687.46 rows=9912 width=8) (actual 
time=0.085..2.145 rows=50 loops=1)
 ->  Index Scan using stats_keyId on stats  (cost=0.00..3820.19 
rows=99116 width=8) (actual time=0.031..1.016 rows=481 loops=1)

 Total runtime: 2.451 ms
(4 rows)


=
QUERY 2 (very *fast*):
=
SELECT keyId, sortNum
FROM items i
WHERE sortNum > 123
ORDER BY sortNum
LIMIT 50

 Limit  (cost=0.01..9.87 rows=50 width=8) (actual time=0.068..0.610 
rows=50 loops=1)

   InitPlan
 ->  Limit  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.009..0.025 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.006..0.007 rows=1 loops=1)
   ->  Index Scan using items_sortNum on items i  (cost=0.00..1053.67 
rows=5344 width=8) (actual time=0.063..0.455 rows=50 loops=1)

 Index Cond: (sortNum >= $0)
 Total runtime: 0.749 ms
(7 rows)



=
COMBINED QUERY (very *slow*):
=
  SELECT keyId, sortNum, count(1)
  FROM stats s, items i
  WHERE s.keyId = i.keyId AND i.sortNum > 123
  GROUP BY i.keyId, i.sortNum
  ORDER BY i.sortNum
  LIMIT 50

Limit  (cost=3281.72..3281.84 rows=50 width=16) (actual 
time=435.838..436.043 rows=50 loops=1)

   InitPlan
 ->  Limit  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.016..0.021 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.012..0.013 rows=1 loops=1)
   ->  Sort  (cost=3281.71..3289.97 rows=3304 width=16) (actual 
time=435.833..435.897 rows=50 loops=1)

 Sort Key: i.sortNum
 ->  Hash Join  (cost=2745.80..3088.59 rows=3304 width=16) 
(actual time=364.247..413.164 rows=8490 loops=1)

   Hash Cond: (s.keyId = i.keyId)
   ->  HashAggregate  (cost=2270.53..2394.43 rows=9912 
width=8) (actual time=337.869..356.533 rows=9911 loops=1)
 ->  Seq Scan on items  (cost=0.00..1527.16 
rows=99116 width=8) (actual time=0.016..148.118 rows=99116 loops=1)
   ->  Hash  (cost=408.47..408.47 rows=5344 width=12) 
(actual time=26.342..26.342 rows=4491 loops=1)
 ->  Bitmap Heap Scan on items i 
(cost=121.67..408.47 rows=5344 width=12) (actual time=5.007..16.898 
rows=4491 loops=1)

   Recheck Cond: (sortNum >= $0)
   ->  Bitmap Index Scan on items_sortNum 
(cost=0.00..120.33 rows=5344 width=0) (actual time=4.273..4.273 
rows=13375 loops=1)

 Index Cond: (sortNum >= $0)
Total runtime: 436.421 ms
(16 rows)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Two fast queries get slow when combined

2007-10-31 Thread cluster
There's something odd about that plan. It's doing both a seq scan and a 
bitmap scan on "items", but I can't see stats table being mentioned 
anywhere.


Huh? Aaah, sorry. I made a major search/replace-refactoring (that 
obviously went wrong) on all open files in the editor before posting to 
this newsgroup, and one of these files was my preparation of the queries 
and planner output. Damn.

Sorry for wasting your time! :-(

However, your suggestion worked perfectly. Thanks a lot!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Two fast queries get slow when combined

2007-10-31 Thread cluster

You are lying to us about how those queries were posed to Postgres
(and no I don't feel a need to explain how I know).


Sorry. The "lying" was not intended as explained in my reply to Heikku.

Thanks for the tips anyways.

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Query only slow on first run

2007-11-27 Thread cluster
I have a query that takes about 7000 ms in average to complete the first 
time it runs. Subsequent runs complete in only 50 ms. That is more than 
a factor 100 faster! How can I make the query perform good in the first 
run too?


Query and output from both first and second run of Explain Analyze is 
pasted here:


http://rafb.net/p/yrKyoA17.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Query only slow on first run

2007-11-27 Thread cluster

Probably by buying much faster disk hardware.

Or buy more RAM, so that the data can stay cached.


So the only problem here is lack of RAM and/or disk speed?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Query only slow on first run

2007-11-27 Thread cluster

As for optimizing the query, I noticed that all three joins are done by
nested loops.  I wonder if another join method would be faster.  Have you
analyzed all the tables?


Yes. I did a VACUUM FULL ANALYZE before running the test queries. Also I 
have just performed an ANALYZE just to be sure everything was really 
analyzed.



You aren't disabling hash joins or merge joins are
you?


Nope.


 If you aren't, then as a test I would try disabling nested loops by
doing "set enable_nestloop=false" and see if the query is any faster for
you.


If I disable the nested loops, the query becomes *much* slower.

A thing that strikes me is the following. As you can see I have the 
constraint: q.status = 1. Only a small subset of the data set has this 
status. I have an index on q.status but for some reason this is not 
used. Instead the constraint are ensured with a "Filter: (q.status = 1)" 
in an index scan for the primary key in the "q" table. If the small 
subset having q.status = 1 could be isolated quickly using an index, I 
would expect the query to perform better. I just don't know why the 
planner doesn't use the index on q.status.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query only slow on first run

2007-11-28 Thread cluster

I'm wondering why --- doesn't seem like it should take 6400msec to fetch
646 rows, unless perhaps the data is just horribly misordered relative
to the index.   Which may in fact be the case ...


Hmm, actually I still don't understand why it takes 6400 ms to fetch the 
rows. As far as I can see the index used is "covering" so that real row 
lookups shouldn't be necessary. Also, only the the random_numbers 
induces by questions with status = 1 should be considered - and this 
part is a relatively small subset.


In general, I don't understand why the query is so I/O dependant as it 
apparently is.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Query only slow on first run

2007-11-28 Thread cluster

The indexes don't contain visibility information, so Postgres has to look up
the row on disk to verify it isn't dead.


I guess this fact drastically decreases the performance. :-(
The number of rows with a random_number will just grow over time while 
the number of questions with status = 1 will always be somewhat constant 
at about 10.000 or most likely much less.


I could really use any kind of suggestion on how to improve the query in 
order to make it scale better for large data sets The 6-7000 ms for a 
clean run is really a showstopper. Need to get it below 70 ms somehow.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Query only slow on first run

2007-11-29 Thread cluster

You're essentially asking for a random sample of data that is not
currently in memory.  You're not going to get that without some I/O.


No, that sounds reasonable enough. But do you agree with the statement 
that my query will just get slower and slower over time as the number of 
posts increases while the part having status = 1 is constant?
(Therefore, as the relevant fraction becomes smaller over time, the 
"Filter: status = 1" operation becomes slower.)


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Appending "LIMIT" to query drastically decreases performance

2007-11-30 Thread cluster

Please post EXPLAIN ANALYZE output for the two queries.


As I wrote in my first post, I pasted this together with the two queries 
at pastebin.com:

  http://pastebin.com/m3c0d1896

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Appending "LIMIT" to query drastically decreases performance

2007-11-30 Thread cluster

Can anyone explain the following odd behavior?
I have a query that completes in about 90 ms. If I append LIMIT to the 
very end, eg. "LIMIT 500" the evaluation time increases to about 800 ms.
How can performance get *worse* by giving the database the option to 
stop the evaluation earlier (when it reaches the output 500 rows)?


I have pasted both queries together with output from explain analyze here:
   http://pastebin.com/m3c0d1896

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread cluster
I'm about to buy a combined web- and database server. When (if) the site 
gets sufficiently popular, we will split the database out to a separate 
server.


Our budget is limited, so how should we prioritize?

* We think about buying some HP Proliant server with at least 4GB ram 
and at least a duo core processor. Possibly quad core. The OS will be 
debian/Linux.


* Much of the database will fit in RAM so it is not *that* necessary to 
prefer the more expensive SAS 1 RPM drives to the cheaper 7500 RPM 
SATA drives, is it? There will both be many read- and write queries and 
a *lot* (!) of random reads.


* I think we will go for hardware-based RAID 1 with a good 
battery-backed-up controller. I have read that software RAID perform 
surprisingly good, but for a production site where hotplug replacement 
of dead disks is required, is software RAID still worth it?


Anything else we should be aware of?

Thanks!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread cluster
Thanks for all your replies! They are enlightening. I have some 
additional questions:


1) Would you prefer
   a) 5.4k 2" SATA RAID10 on four disks or
   b) 10k 2" SAS RAID1 on two disks?
(Remember the lots (!) of random reads)

2) Should I just make one large partition of my RAID? Does it matter at all?

3) Will I gain much by putting the OS on a saparate disk, not included 
in the RAID? (The webserver and database would still share the RAID - 
but I guess the OS will cache my (small) web content in RAM anyway).


Thanks again!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-30 Thread cluster

We are now leaning towards just buying 4 SAS disks.

So should I just make one large RAID-10 partition or make two RAID-1's 
having the log on one RAID and everything else on the second RAID?

How can I get the best read/write performance out of these four disks?
(Remember, that it is a combined web-/database server).

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance