Re: [PERFORM] Using Gprof with Postgresql

2009-09-08 Thread Reydan Cankur
I just compiled it with gcc and produces the gmon.out file for every  
process; by the way I am running below script in order to produce  
readable .out files


 gprof .../pgsql/bin/postgres gmon.out > createtable2.out

is postgres the right executable?

regards
reydan

On Sep 7, 2009, at 8:24 PM, Tom Lane wrote:


Well, you could use gcc ... icc claims to support the -pg switch but
the above sounds like it just ignores it.

regards, tom lane



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using Gprof with Postgresql

2009-09-08 Thread Reydan Cankur





I just compiled it with gcc and produces the gmon.out file for every  
process; by the way I am running below script in order to produce  
readable .out files


gprof .../pgsql/bin/postgres gmon.out > createtable2.out

is postgres the right executable?

regards
reydan

On Sep 7, 2009, at 8:24 PM, Tom Lane wrote:


Well, you could use gcc ... icc claims to support the -pg switch but
the above sounds like it just ignores it.

regards, tom lane





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using Gprof with Postgresql

2009-09-08 Thread Pierre Frédéric Caillau d
I just compiled it with gcc and produces the gmon.out file for every  
process; by the way I am running below script in order to produce  
readable .out files


  gprof .../pgsql/bin/postgres gmon.out > createtable2.out

is postgres the right executable?

regards
reydan


Off topic, but hace you tried oprofile ? It's excellent...

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using Gprof with Postgresql

2009-09-08 Thread Craig James

Pierre Frédéric Caillaud wrote:
I just compiled it with gcc and produces the gmon.out file for every 
process; by the way I am running below script in order to produce 
readable .out files


  gprof .../pgsql/bin/postgres gmon.out > createtable2.out

is postgres the right executable?

regards
reydan


Off topic, but hace you tried oprofile ? It's excellent...


I find valgrind to be an excellent profiling tool.  It has the advantage that it runs on 
an unmodified executable (using a virtual machine).  You can compile postgres the regular 
way, start the system up, and then create a short shell script called 
"postgres" that you put in place of the original executable that invokes 
valgrind on the original executable.  Then when postgres starts up your backend, you have 
just one valgrind process running, rather than the whole Postgres system.

Valgrind does 100% tracing of the program rather than statistical sampling, and 
since it runs in a pure virtual machine, it can detect almost all memory 
corruption and leaks.

The big disadvantage of valgrind is that it slows the process WAY down, like by 
a factor of 5-10 on CPU.  For a pure CPU process, it doesn't screw up your 
stats, but if a process is mixed CPU and I/O, the CPU will appear to dominate.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Forcing postgresql to use an index

2009-09-08 Thread Eugene Morozov
Hello,

I have a following query (autogenerated by Django)

SELECT activity_activityevent.id, activity_activityevent.user_id, 
activity_activityevent.added_on
FROM activity_activityevent 
WHERE activity_activityevent.user_id IN (
   SELECT U0.user_id 
   FROM profile U0 
   INNER JOIN profile_friends U1 
   ON U0.user_id = U1.to_profile_id
   WHERE U1.from_profile_id = 5
) 
ORDER BY activity_activityevent.added_on DESC LIMIT 10


When I run EXPLAIN ANALYZE with my default settings (seq scan is on,
random_page_cost = 4) I get the following result:

Limit  (cost=4815.62..4815.65 rows=10 width=202) (actual time=332.938..332.977 
rows=10 loops=1)
  ->  Sort  (cost=4815.62..4816.35 rows=292 width=202) (actual 
time=332.931..332.945 rows=10 loops=1)
Sort Key: activity_activityevent.added_on
Sort Method:  top-N heapsort  Memory: 19kB
->  Hash IN Join  (cost=2204.80..4809.31 rows=292 width=202) (actual 
time=12.856..283.916 rows=15702 loops=1)
  Hash Cond: (activity_activityevent.user_id = u0.user_id)
  ->  Seq Scan on activity_activityevent  (cost=0.00..2370.43 
rows=61643 width=202) (actual time=0.020..126.129 rows=61643 loops=1)
  ->  Hash  (cost=2200.05..2200.05 rows=380 width=8) (actual 
time=12.777..12.777 rows=424 loops=1)
->  Nested Loop  (cost=11.20..2200.05 rows=380 width=8) 
(actual time=0.260..11.594 rows=424 loops=1)
  ->  Bitmap Heap Scan on profile_friends u1  
(cost=11.20..62.95 rows=380 width=4) (actual time=0.228..1.202 rows=424 loops=1)
Recheck Cond: (from_profile_id = 5)
->  Bitmap Index Scan on 
profile_friends_from_profile_id_key  (cost=0.00..11.10 rows=380 width=0) 
(actual time=0.208..0.208 rows=424 loops=1)
  Index Cond: (from_profile_id = 5)
  ->  Index Scan using profile_pkey on profile u0  
(cost=0.00..5.61 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=424)
Index Cond: (u0.user_id = u1.to_profile_id)
Total runtime: 333.190 ms

But when I disable seq scan or set random_page_cost to 1.2 (higher
values doesn't change the plan), postgres starts using index and query
runs two times faster:

Limit  (cost=9528.36..9528.38 rows=10 width=202) (actual time=165.047..165.090 
rows=10 loops=1)
  ->  Sort  (cost=9528.36..9529.09 rows=292 width=202) (actual 
time=165.042..165.058 rows=10 loops=1)
Sort Key: activity_activityevent.added_on
Sort Method:  top-N heapsort  Memory: 19kB
->  Nested Loop  (cost=2201.00..9522.05 rows=292 width=202) (actual 
time=13.074..126.209 rows=15702 loops=1)
  ->  HashAggregate  (cost=2201.00..2204.80 rows=380 width=8) 
(actual time=12.996..14.131 rows=424 loops=1)
->  Nested Loop  (cost=11.20..2200.05 rows=380 width=8) 
(actual time=0.263..11.665 rows=424 loops=1)
  ->  Bitmap Heap Scan on profile_friends u1  
(cost=11.20..62.95 rows=380 width=4) (actual time=0.232..1.181 rows=424 loops=1)
Recheck Cond: (from_profile_id = 5)
->  Bitmap Index Scan on 
profile_friends_from_profile_id_key  (cost=0.00..11.10 rows=380 width=0) 
(actual time=0.210..0.210 rows=424 loops=1)
  Index Cond: (from_profile_id = 5)
  ->  Index Scan using profile_pkey on profile u0  
(cost=0.00..5.61 rows=1 width=4) (actual time=0.013..0.016 rows=1 loops=424)
Index Cond: (u0.user_id = u1.to_profile_id)
  ->  Index Scan using activity_activityevent_user_id on 
activity_activityevent  (cost=0.00..18.82 rows=35 width=202) (actual 
time=0.014..0.130 rows=37 loops=424)
Index Cond: (activity_activityevent.user_id = u0.user_id)
Total runtime: 165.323 ms


Can anyone enlighten me? Should I set random_page_cost to 1.2
permanently (I feel this is not a really good idea in my case)?

Eugene


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing postgresql to use an index

2009-09-08 Thread Grzegorz Jaśkiewicz
Learn it to not generate with "WITH IN (subq)", is this can be quite
slow on postgresql. Use joins instead.

looks like planner was wrong about rowcount in one place: Hash IN Join
 (cost=2204.80..4809.31 rows=292 width=202) (actual
time=12.856..283.916 rows=15702 loops=1)

I have no idea why, probably more knowledgeable guys will know more
about why. But overall, all other stats seem to be okay.
What's the default_statistics_target setting in the postgresql set to?

One thing tho, what's the version, and platform.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing postgresql to use an index

2009-09-08 Thread Kevin Grittner
Eugene Morozov  wrote: 
 
> Can anyone enlighten me? Should I set random_page_cost to 1.2
> permanently (I feel this is not a really good idea in my case)?
 
For it to pass as many rows as it did in the time that it did, most or
all of the "reads" were cached.  If this is typically the case, at
least for the queries for which performance is most critical, your
change makes sense as a permanent setting.  In fact, you might want to
go even further -- there have been many reports of people getting good
performance on fully-cached systems by dropping both random_page_cost
and seq_page_cost to 0.1, so that the optimizer better estimates the
relative cost of "disk access" versus CPU-based operations.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing postgresql to use an index

2009-09-08 Thread Scott Marlowe
On Tue, Sep 8, 2009 at 8:12 AM, Eugene Morozov wrote:
> Hello,
>
> I have a following query (autogenerated by Django)
>
> SELECT activity_activityevent.id, activity_activityevent.user_id, 
> activity_activityevent.added_on
> FROM activity_activityevent
> WHERE activity_activityevent.user_id IN (
>   SELECT U0.user_id
>   FROM profile U0
>   INNER JOIN profile_friends U1
>   ON U0.user_id = U1.to_profile_id
>   WHERE U1.from_profile_id = 5
> )
> ORDER BY activity_activityevent.added_on DESC LIMIT 10
>
>
> When I run EXPLAIN ANALYZE with my default settings (seq scan is on,
> random_page_cost = 4) I get the following result:
>
> Limit  (cost=4815.62..4815.65 rows=10 width=202) (actual 
> time=332.938..332.977 rows=10 loops=1)
>  ->  Sort  (cost=4815.62..4816.35 rows=292 width=202) (actual 
> time=332.931..332.945 rows=10 loops=1)
>        Sort Key: activity_activityevent.added_on
>        Sort Method:  top-N heapsort  Memory: 19kB
>        ->  Hash IN Join  (cost=2204.80..4809.31 rows=292 width=202) (actual 
> time=12.856..283.916 rows=15702 loops=1)
>              Hash Cond: (activity_activityevent.user_id = u0.user_id)
>              ->  Seq Scan on activity_activityevent  (cost=0.00..2370.43 
> rows=61643 width=202) (actual time=0.020..126.129 rows=61643 loops=1)
>              ->  Hash  (cost=2200.05..2200.05 rows=380 width=8) (actual 
> time=12.777..12.777 rows=424 loops=1)
>                    ->  Nested Loop  (cost=11.20..2200.05 rows=380 width=8) 
> (actual time=0.260..11.594 rows=424 loops=1)
>                          ->  Bitmap Heap Scan on profile_friends u1  
> (cost=11.20..62.95 rows=380 width=4) (actual time=0.228..1.202 rows=424 
> loops=1)
>                                Recheck Cond: (from_profile_id = 5)
>                                ->  Bitmap Index Scan on 
> profile_friends_from_profile_id_key  (cost=0.00..11.10 rows=380 width=0) 
> (actual time=0.208..0.208 rows=424 loops=1)
>                                      Index Cond: (from_profile_id = 5)
>                          ->  Index Scan using profile_pkey on profile u0  
> (cost=0.00..5.61 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=424)
>                                Index Cond: (u0.user_id = u1.to_profile_id)
> Total runtime: 333.190 ms
>
> But when I disable seq scan or set random_page_cost to 1.2 (higher
> values doesn't change the plan), postgres starts using index and query
> runs two times faster:
>
> Limit  (cost=9528.36..9528.38 rows=10 width=202) (actual 
> time=165.047..165.090 rows=10 loops=1)
>  ->  Sort  (cost=9528.36..9529.09 rows=292 width=202) (actual 
> time=165.042..165.058 rows=10 loops=1)
>        Sort Key: activity_activityevent.added_on
>        Sort Method:  top-N heapsort  Memory: 19kB
>        ->  Nested Loop  (cost=2201.00..9522.05 rows=292 width=202) (actual 
> time=13.074..126.209 rows=15702 loops=1)
>              ->  HashAggregate  (cost=2201.00..2204.80 rows=380 width=8) 
> (actual time=12.996..14.131 rows=424 loops=1)
>                    ->  Nested Loop  (cost=11.20..2200.05 rows=380 width=8) 
> (actual time=0.263..11.665 rows=424 loops=1)
>                          ->  Bitmap Heap Scan on profile_friends u1  
> (cost=11.20..62.95 rows=380 width=4) (actual time=0.232..1.181 rows=424 
> loops=1)
>                                Recheck Cond: (from_profile_id = 5)
>                                ->  Bitmap Index Scan on 
> profile_friends_from_profile_id_key  (cost=0.00..11.10 rows=380 width=0) 
> (actual time=0.210..0.210 rows=424 loops=1)
>                                      Index Cond: (from_profile_id = 5)
>                          ->  Index Scan using profile_pkey on profile u0  
> (cost=0.00..5.61 rows=1 width=4) (actual time=0.013..0.016 rows=1 loops=424)
>                                Index Cond: (u0.user_id = u1.to_profile_id)
>              ->  Index Scan using activity_activityevent_user_id on 
> activity_activityevent  (cost=0.00..18.82 rows=35 width=202) (actual 
> time=0.014..0.130 rows=37 loops=424)
>                    Index Cond: (activity_activityevent.user_id = u0.user_id)
> Total runtime: 165.323 ms
>
>
> Can anyone enlighten me? Should I set random_page_cost to 1.2
> permanently (I feel this is not a really good idea in my case)?

OK, you need to look a little deeper at what's happening here.  The
pgsql query planner looks at a lot of things to decide if to use seq
scan or and index.  If you look at your row estimates versus actual
rows returned, you'll see they're off, sometimes by quite a bit.
Particularly the ones near the top of your query plans. There are a
few things you can do to help out here.  Increase default stats target
and re-analyse, increase effective_cache_size to reflect the actual
size of data being cached by your OS / filesystem / pgsql, and then
lowering random_page_cost.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [sfpug] Statistics and PostgreSQL: Streaming Webcast tonight

2009-09-08 Thread David Fetter
On Tue, Sep 08, 2009 at 10:30:21AM -0700, David Fetter wrote:
> Folks,
> 
> For those of you who can't attend in person, we'll be streaming audio
> and video and having a chat for tonight's SFPUG meeting on how the
> planner uses statistics.
> 
> Video:
> 
> http://media.postgresql.org/sfpug/streaming
> 
> Chat:
> 
> irc://irc.freenode.net/sfpug

And the important part is, the meeting starts at 7pm Pacific time.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Statistics and PostgreSQL: Streaming Webcast tonight

2009-09-08 Thread David Fetter
Folks,

For those of you who can't attend in person, we'll be streaming audio
and video and having a chat for tonight's SFPUG meeting on how the
planner uses statistics.

Video:

http://media.postgresql.org/sfpug/streaming

Chat:

irc://irc.freenode.net/sfpug

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [GENERAL] [sfpug] Statistics and PostgreSQL: Streaming Webcast tonight

2009-09-08 Thread David Fetter
On Tue, Sep 08, 2009 at 10:32:53AM -0700, David Fetter wrote:
> On Tue, Sep 08, 2009 at 10:30:21AM -0700, David Fetter wrote:
> > Folks,
> > 
> > For those of you who can't attend in person, we'll be streaming audio
> > and video and having a chat for tonight's SFPUG meeting on how the
> > planner uses statistics.
> > 
> > Video:
> > 
> > http://media.postgresql.org/sfpug/streaming
> > 
> > Chat:
> > 
> > irc://irc.freenode.net/sfpug
> 
> And the important part is, the meeting starts at 7pm Pacific time.

Sorry about the confusion, folks.  It's 7:30pm Pacific time.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] partitioning max() sql not using index

2009-09-08 Thread Kevin Kempter
Hi all I have a large table (>2billion rows) that's partitioned by date based 
on an epoch int value.  We're running a select max(id) where id is the PK. I 
have a PK index on each of the partitions, no indexes at all on the base 
table.

If I hit a partition table directly I get an index scan as expected:

explain select max(id) from pwreport.bigtab_2009_09;
QUERY PLAN
--
 Result  (cost=0.06..0.07 rows=1 width=0)
   InitPlan
 ->  Limit  (cost=0.00..0.06 rows=1 width=8)
   ->  Index Scan Backward using bigtab_2009_09_pk on bigtab_2009_09  
(cost=0.00..12403809.95 rows=205659919 width=8)
 Filter: (id IS NOT NULL)
(5 rows)


However if I hit the base table I get a sequential scan on every partition as 
opposed to index scans:
explain select max(id) from pwreport.bigtab;

 QUERY PLAN 

  


  
 Aggregate  (cost=27214318.67..27214318.68 rows=1 width=8)  
   
   ->  Append  (cost=0.00..24477298.53 rows=1094808053 width=8) 
   
 ->  Seq Scan on bigtab  (cost=0.00..11.70 rows=170 width=8)
   
 ->  Seq Scan on bigtab_2011_12 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_11 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_10 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_09 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_08 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_07 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_06 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_05 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_04 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_03 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_02 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2011_01 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_12 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_11 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_10 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_09 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_08 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_07 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_06 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_05 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_04 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_03 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_02 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2010_01 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2009_12 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2009_11 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2009_10 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2009_09 bigtab  (cost=0.00..4599227.19 
rows=205659919 width=8) 
 ->  Seq Scan on bigtab_2009_07 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2009_06 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2009_05 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2009_04 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 ->  Seq Scan on bigtab_2009_03 bigtab

Re: [PERFORM] Forcing postgresql to use an index

2009-09-08 Thread Eugene Morozov
Scott Marlowe  writes:

> On Tue, Sep 8, 2009 at 8:12 AM, Eugene Morozov wrote:
> OK, you need to look a little deeper at what's happening here.  The
> pgsql query planner looks at a lot of things to decide if to use seq
> scan or and index.  If you look at your row estimates versus actual
> rows returned, you'll see they're off, sometimes by quite a bit.
> Particularly the ones near the top of your query plans. There are a
> few things you can do to help out here.  Increase default stats target
> and re-analyse, increase effective_cache_size to reflect the actual
> size of data being cached by your OS / filesystem / pgsql, and then
> lowering random_page_cost.

Thanks to all who answered. Your answers were really helpful, I've
split the query in two (couldn't make Django to use JOIN here) and was
able to speed it up by a factor of 10!
Eugene


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance