Re: [PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.

2009-01-29 Thread Kevin Traster
On Wed, Jan 28, 2009 at 11:37 PM, Scott Marlowe wrote:

> On Thu, Jan 29, 2009 at 12:01 AM, Kevin Traster  wrote:
> > 2 questions:
> >
> > 1) Different costs for same actions. Doing an explain on 2 nearly
> identical
> > queries both involving the same Index scan on same table has 2 widely
> > different costs for same Index scan  303375872.86 vs. 12576.70
>
> Pretty sure this is a FAQ by now.
>
> not in and except treat nulls differently.  If you table has nullable
> fields and nulls would break your query, then not in () is a bad
> choice.  Therefore, effort to optimize had been placed into except,
> which is distinctly, symantically different from not in ().
>
> It seems like some shift in the pg community has happened where we're
> suddenly getting a lot of folks who came from a database where not in
> and except are treated the same, even though they most definitely do
> not mean the same thing.
>


Re: [PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.

2009-01-29 Thread Kevin Traster
On Wed, Jan 28, 2009 at 11:37 PM, Scott Marlowe wrote:

> On Thu, Jan 29, 2009 at 12:01 AM, Kevin Traster  wrote:
> > 2 questions:
> >
> > 1) Different costs for same actions. Doing an explain on 2 nearly
> identical
> > queries both involving the same Index scan on same table has 2 widely
> > different costs for same Index scan  303375872.86 vs. 12576.70
>
> Pretty sure this is a FAQ by now.
>
> not in and except treat nulls differently.  If you table has nullable
> fields and nulls would break your query, then not in () is a bad
> choice.  Therefore, effort to optimize had been placed into except,
> which is distinctly, symantically different from not in ().
>
> It seems like some shift in the pg community has happened where we're
> suddenly getting a lot of folks who came from a database where not in
> and except are treated the same, even though they most definitely do
> not mean the same thing.
>


Umm... No. The top of the post you quoted regards the difference between the
query "get ciknum from cik" versus get ciknum from cik where NOT IN The
only differene between the two queries is the qualification of "where ciknum
not in ".  It does not involve the difference between NOT IN versus
Except

Both queries do an Index Scan using cik_ciknum_idx and those numbers show
the different costs doing the same task.

In this case, neither table  allowes nulls in the columns, both tables have
single indexes on the columns used.

Regarding the previous posts about the same issues of PERFORMENCE between
NOT IN versus EXCEPT. There has not been any answer to explain it - just
talk about the differenences between the two results.

Yes, I can still get the results using EXCEPT but it would be nice to no why
I can't get NOT IN to complete the simple query.


Re: [PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.

2009-01-29 Thread Gregory Stark
Kevin Traster  writes:

> Regarding the previous posts about the same issues of PERFORMENCE between
> NOT IN versus EXCEPT. There has not been any answer to explain it - just
> talk about the differenences between the two results.
>
> Yes, I can still get the results using EXCEPT but it would be nice to no why
> I can't get NOT IN to complete the simple query.


There are two answers here. One you've already been given, that NOT IN has to
handle NULLs specially and that makes these plans not equivalent. The NOT IN
is decidedly harder to solve.

The other answer is that EXCEPT is a set operation which in Postgres uses a
completely different set of logic. Even if you used NOT EXISTS which really is
equivalent to EXCEPT the resulting plans would be different. Which one would
be better would depend on the circumstances. In an ideal world every
equivalent query would generate identical plans. We don't live in an ideal
world and Postgres isn't perfect.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.

2009-01-29 Thread Tom Lane
Kevin Traster  writes:
>  Unique  (cost=3506.21..303375872.86 rows=71946 width=8)
>->  Index Scan using cik_ciknum_idx on cik  (cost=3506.21..303375616.75
> rows=102444 width=8)
>  Filter: (NOT (subplan))
>  SubPlan
>->  Materialize  (cost=3506.21..6002.40 rows=186019 width=4)
>  ->  Seq Scan on owner_cik_master  (cost=0.00..2684.19
> rows=186019 width=4)

It will help some if you raise work_mem enough so you get a "hashed
subplan" there, assuming the NOT IN is on a hashable datatype.

But as was already noted, more work has been put into optimizing
EXCEPT and NOT EXISTS than NOT IN, because the latter is substantially
less useful due to its unintuitive but spec-mandated handling of NULLs.
(And this disparity will be even larger in 8.4.)  We're not going to
apologize for that, and we're not going to regard it as a bug.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Max on union

2009-01-29 Thread anders.blaagaard
Hi,
 
If I have a view like:
 
create view X as (
select x from A
union all
select x from B)
 
and do
 
select max(x) from X
 
I get a plan like:
 
Aggregate
  Append
Seq Scan on A
Seq Scan on B
 
If A and B are indexed on x, I can get the result much faster as:
 
select max(x) from (
select max(x) from A
union all
select max(x) from B) X
 
with the plan:
 
Aggregate
  Append
Result
  Limit
Index Scan Backward using .. on A
Result
  Limit
Index Scan Backward using .. on B
 
My question is basically why the optimizer doesn't do this? Is it hard,
or is it just something that hasn't been done yet?
My guess is that the second plan would always be as fast or faster than
the first one - even if A and B wasn't indexed?
 
Anders
 


Re: [PERFORM] LIKE Query performance

2009-01-29 Thread Hari, Balaji
Is there a patch to make Wildspeed work with postgresql version 8.3.1?

P.S
My bad, the version number was incorrect in my previous mail.
-Original Message-
From: Oleg Bartunov [mailto:o...@sai.msu.su] 
Sent: Wednesday, January 28, 2009 1:27 AM
To: Hari, Balaji
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] LIKE Query performance

Only wildspeed http://www.sai.msu.su/~megera/wiki/wildspeed
has index support for %text% 
But, it has limitations.

Oleg
On Tue, 27 Jan 2009, Hari, Balaji wrote:

> Hi,
>
> I am relatively new to PostgreSQL(8.1) and facing the following problem.
>
> We have indexes defined on timestamp and description (create index 
> description_idx on event using btree (description varchar_pattern_ops))
>
> EXPLAIN ANALYZE SELECT event_id, category, current_session_number, 
> description, event_type_id, realm_name, root_session_number, severity, 
> source_name, target_key, target_name, timestamp, jdo_version FROM event WHERE 
> description like '%mismatch%' ORDER BY timestamp desc;
>  QUERY PLAN
> --
> Sort  (cost=36267.09..36272.73 rows=2256 width=314) (actual 
> time=19255.075..20345.774 rows=647537 loops=1)
>   Sort Key: "timestamp"
>   Sort Method:  external merge  Disk: 194080kB
>   ->  Seq Scan on event  (cost=0.00..36141.44 rows=2256 width=314) (actual 
> time=0.080..1475.041 rows=647537 loops=1)
> Filter: ((description)::text ~~ '%mismatch%'::text)
> Total runtime: 22547.292 ms
> (6 rows)
>
> But startsWith query use indexes.
>
> EXPLAIN ANALYZE SELECT event_id, category, current_session_number, 
> description, event_type_id, realm_name, root_session_number, severity, 
> source_name, target_key, target_name, timestamp, jdo_version FROM event WHERE 
> description like 'mismatch%' ORDER BY timestamp desc;
>  QUERY PLAN
> ---
> Sort  (cost=9.26..9.27 rows=1 width=314) (actual time=0.766..0.766 rows=0 
> loops=1)
>   Sort Key: "timestamp"
>   Sort Method:  quicksort  Memory: 17kB
>   ->  Index Scan using description_idx on event  (cost=0.00..9.25 rows=1 
> width=314) (actual time=0.741..0.741 rows=0 loops=1)
> Index Cond: (((description)::text ~>=~ 'mismatch'::text) AND 
> ((description)::text ~<~ 'mismatci'::text))
> Filter: ((description)::text ~~ 'mismatch%'::text)
> Total runtime: 0.919 ms
> (7 rows)
>
> Is there any tweaks to force pgsql to use index on description?
>
> Balaji
>
> P.S The event database has 700k records.
>

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIKE Query performance

2009-01-29 Thread Hari, Balaji

We won't need full text searching capabilities as in documents as the data type 
is varchar.

Wildspeed will exactly fit our needs.

-Original Message-
From: Oleg Bartunov [mailto:o...@sai.msu.su] 
Sent: Wednesday, January 28, 2009 1:27 AM
To: Hari, Balaji
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] LIKE Query performance

Only wildspeed http://www.sai.msu.su/~megera/wiki/wildspeed
has index support for %text% 
But, it has limitations.

Oleg
On Tue, 27 Jan 2009, Hari, Balaji wrote:

> Hi,
>
> I am relatively new to PostgreSQL(8.1) and facing the following problem.
>
> We have indexes defined on timestamp and description (create index 
> description_idx on event using btree (description varchar_pattern_ops))
>
> EXPLAIN ANALYZE SELECT event_id, category, current_session_number, 
> description, event_type_id, realm_name, root_session_number, severity, 
> source_name, target_key, target_name, timestamp, jdo_version FROM event WHERE 
> description like '%mismatch%' ORDER BY timestamp desc;
>  QUERY PLAN
> --
> Sort  (cost=36267.09..36272.73 rows=2256 width=314) (actual 
> time=19255.075..20345.774 rows=647537 loops=1)
>   Sort Key: "timestamp"
>   Sort Method:  external merge  Disk: 194080kB
>   ->  Seq Scan on event  (cost=0.00..36141.44 rows=2256 width=314) (actual 
> time=0.080..1475.041 rows=647537 loops=1)
> Filter: ((description)::text ~~ '%mismatch%'::text)
> Total runtime: 22547.292 ms
> (6 rows)
>
> But startsWith query use indexes.
>
> EXPLAIN ANALYZE SELECT event_id, category, current_session_number, 
> description, event_type_id, realm_name, root_session_number, severity, 
> source_name, target_key, target_name, timestamp, jdo_version FROM event WHERE 
> description like 'mismatch%' ORDER BY timestamp desc;
>  QUERY PLAN
> ---
> Sort  (cost=9.26..9.27 rows=1 width=314) (actual time=0.766..0.766 rows=0 
> loops=1)
>   Sort Key: "timestamp"
>   Sort Method:  quicksort  Memory: 17kB
>   ->  Index Scan using description_idx on event  (cost=0.00..9.25 rows=1 
> width=314) (actual time=0.741..0.741 rows=0 loops=1)
> Index Cond: (((description)::text ~>=~ 'mismatch'::text) AND 
> ((description)::text ~<~ 'mismatci'::text))
> Filter: ((description)::text ~~ 'mismatch%'::text)
> Total runtime: 0.919 ms
> (7 rows)
>
> Is there any tweaks to force pgsql to use index on description?
>
> Balaji
>
> P.S The event database has 700k records.
>

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIKE Query performance

2009-01-29 Thread Oleg Bartunov

On Thu, 29 Jan 2009, Hari, Balaji wrote:


Is there a patch to make Wildspeed work with postgresql version 8.3.1?


unfortunately, no.



P.S
My bad, the version number was incorrect in my previous mail.
-Original Message-
From: Oleg Bartunov [mailto:o...@sai.msu.su]
Sent: Wednesday, January 28, 2009 1:27 AM
To: Hari, Balaji
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] LIKE Query performance

Only wildspeed http://www.sai.msu.su/~megera/wiki/wildspeed
has index support for %text%
But, it has limitations.

Oleg
On Tue, 27 Jan 2009, Hari, Balaji wrote:


Hi,

I am relatively new to PostgreSQL(8.1) and facing the following problem.

We have indexes defined on timestamp and description (create index 
description_idx on event using btree (description varchar_pattern_ops))

EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, 
event_type_id, realm_name, root_session_number, severity, source_name, 
target_key, target_name, timestamp, jdo_version FROM event WHERE description 
like '%mismatch%' ORDER BY timestamp desc;
 QUERY PLAN
--
Sort  (cost=36267.09..36272.73 rows=2256 width=314) (actual 
time=19255.075..20345.774 rows=647537 loops=1)
  Sort Key: "timestamp"
  Sort Method:  external merge  Disk: 194080kB
  ->  Seq Scan on event  (cost=0.00..36141.44 rows=2256 width=314) (actual 
time=0.080..1475.041 rows=647537 loops=1)
Filter: ((description)::text ~~ '%mismatch%'::text)
Total runtime: 22547.292 ms
(6 rows)

But startsWith query use indexes.

EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, 
event_type_id, realm_name, root_session_number, severity, source_name, 
target_key, target_name, timestamp, jdo_version FROM event WHERE description 
like 'mismatch%' ORDER BY timestamp desc;
 QUERY PLAN
---
Sort  (cost=9.26..9.27 rows=1 width=314) (actual time=0.766..0.766 rows=0 
loops=1)
  Sort Key: "timestamp"
  Sort Method:  quicksort  Memory: 17kB
  ->  Index Scan using description_idx on event  (cost=0.00..9.25 rows=1 
width=314) (actual time=0.741..0.741 rows=0 loops=1)
Index Cond: (((description)::text ~>=~ 'mismatch'::text) AND 
((description)::text ~<~ 'mismatci'::text))
Filter: ((description)::text ~~ 'mismatch%'::text)
Total runtime: 0.919 ms
(7 rows)

Is there any tweaks to force pgsql to use index on description?

Balaji

P.S The event database has 700k records.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Max on union

2009-01-29 Thread Robert Haas
On Thu, Jan 29, 2009 at 10:58 AM,   wrote:
> Hi,
>
> If I have a view like:
>
> create view X as (
> select x from A
> union all
> select x from B)
>
> and do
>
> select max(x) from X
>
> I get a plan like:
>
> Aggregate
>   Append
> Seq Scan on A
> Seq Scan on B
>
> If A and B are indexed on x, I can get the result much faster as:
>
> select max(x) from (
> select max(x) from A
> union all
> select max(x) from B) X
>
> with the plan:
>
> Aggregate
>   Append
> Result
>   Limit
> Index Scan Backward using .. on A
> Result
>   Limit
> Index Scan Backward using .. on B
>
> My question is basically why the optimizer doesn't do this? Is it hard, or
> is it just something that hasn't been done yet?
> My guess is that the second plan would always be as fast or faster than the
> first one - even if A and B wasn't indexed?

Well, it's certainly not going to be faster without the index. You
can't very well do an index scan backward without an index.

As for why it doesn't do that, I don't think a huge amount of effort
has been put into optimizing the handling of appendrels.  Patches are
welcome

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Using multiple cores for index creation?

2009-01-29 Thread henk de wit

Hi,When I try to restore a database dump on PostgreSQL 8.3 that's approximately 
130GB in size and takes about 1 hour, I noticed index creation makes up the 
bulk of that time. I'm using a very fast I/O subsystem (16 Mtron Pro 7535 SSDs 
using a dual 1.2Ghz IOP/4GB cache RAID controller), fast CPUs (2 quad core 
C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the restore process, I learned 
that only 10 minutes is spend doing IO, while the rest of the time is spend on 
creating the indexes. Index creation seems to be completely CPU bound.The 
problem is that only 1 CPU core is used. My other 7 cores are just sitting 
there doing nothing. It seems to me that creating each index, especially for 
different tables, is something that can be done independently.Is there some way 
I can let PostgreSQL use multiple cores for creating the indexes?Thanks in 
advance
_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

[PERFORM] Sort performance

2009-01-29 Thread Subbiah Stalin-XCGF84
Hi All,
 
I'm in the process of tuning a query that does a sort on a huge dataset.
With work_mem set to 2M, i see the sort operation spilling to disk
writing upto 430MB and then return the first 500 rows. Our query is of
the sort
 
select co1, col2... from table where col1 like 'aa%' order col1 limit
500; It took 561Secs to complete. Looking at the execution plan 95% of
the time is spent on sort vs seq scan on the table.
 
Now if set the work_mem to 500MB (i did this in a psql session without
making it global) and ran the same query. One would think the sort
operations would happen in memory and not spill to disk but i still see
430MB written to disk however, the query complete time dropped down to
351Secs. So work_mem did have an impact but wondering why its still
writing to disk when it can all do it memory.
 
I appreciate if anyone can shed some light on this.
 
Thanks,
Stalin
 
Env: Sol 10, Pg 827 64bit.


Re: [PERFORM] PG performance in high volume environment (many INSERTs and lots of aggregation reporting)

2009-01-29 Thread Chris Browne
phoenix.ki...@gmail.com (Phoenix Kiula) writes:
> [Ppsted similar note to PG General but I suppose it's more appropriate
> in this list. Apologies for cross-posting.]
>
> Hi. Further to my bafflement with the "count(*)" queries as described
> in this thread:
>
> http://archives.postgresql.org/pgsql-general/2009-01/msg00804.php
>
> It seems that whenever this question has come up, Postgresql comes up
> very short in terms of "count(*)" functions.
>
> The performance is always slow, because of the planner's need to guess
> and such. I don't fully understand how the statistics work (and the
> explanation on the PG website is way too geeky) but he columns I work
> with already have a stat level of 100. Not helping at all.

That's definitely *NOT* due to "planner's need to guess"; it's due to
there being some *specific* work that PostgreSQL needs to do that some
other databases can avoid due to different storage strategies.

The matter is quite succinctly described here:
http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Counting_rows_in_a_table

I'll just take one excerpt:
---
It is worth observing that it is only this precise form of aggregate
that must be so pessimistic; if augmented with a "WHERE" clause like

SELECT COUNT(*) FROM table WHERE status = 'something'

PostgreSQL, MySQL, and most other database implementations will take
advantage of available indexes against the restricted field(s) to
limit how many records must be counted, which can greatly accelerate
such queries.
---

It is common for systems where it is necessary for aggregation
reporting to be fast to do pre-computation of the aggregates, and that
is in no way specific to PostgreSQL.

If you need *really* fast aggregates, then it will be worthwhile to
put together triggers or procedures or something of the sort to help
pre-compute the aggregates.
-- 
(reverse (concatenate 'string "ofni.sesabatadxunil" "@" "enworbbc"))
http://linuxfinances.info/info/wp.html
"When you have eliminated the impossible, whatever remains, however
improbable, must be the truth." -- Sir Arthur Conan Doyle (1859-1930),
English author. Sherlock Holmes, in The Sign of Four, ch. 6 (1889).
[...but see the Holmesian Fallacy, due to Bob Frankston...
]

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using multiple cores for index creation?

2009-01-29 Thread Robert Haas
On Thu, Jan 29, 2009 at 3:21 PM, henk de wit  wrote:
> Hi,
> When I try to restore a database dump on PostgreSQL 8.3
> that's approximately 130GB in size and takes about 1 hour, I noticed index
> creation makes up the bulk of that time. I'm using a very fast I/O subsystem
> (16 Mtron Pro 7535 SSDs using a dual 1.2Ghz IOP/4GB cache RAID controller),
> fast CPUs (2 quad core C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the
> restore process, I learned that only 10 minutes is spend doing IO, while the
> rest of the time is spend on creating the indexes. Index creation seems to
> be completely CPU bound.
> The problem is that only 1 CPU core is used. My other 7 cores are just
> sitting there doing nothing. It seems to me that creating each index,
> especially for different tables, is something that can be done
> independently.
> Is there some way I can let PostgreSQL use multiple cores for creating the
> indexes?

Andrew Dunstan has been working on this problem.  His latest parallel
restore patch can be found here:

http://archives.postgresql.org/message-id/4977e070.6070...@dunslane.net

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using multiple cores for index creation?

2009-01-29 Thread Joshua D. Drake
On Thu, 2009-01-29 at 18:09 -0500, Robert Haas wrote:
> On Thu, Jan 29, 2009 at 3:21 PM, henk de wit  wrote:
> > Hi,
> > When I try to restore a database dump on PostgreSQL 8.3
> > that's approximately 130GB in size and takes about 1 hour, I noticed index
> > creation makes up the bulk of that time. I'm using a very fast I/O subsystem
> > (16 Mtron Pro 7535 SSDs using a dual 1.2Ghz IOP/4GB cache RAID controller),
> > fast CPUs (2 quad core C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the
> > restore process, I learned that only 10 minutes is spend doing IO, while the
> > rest of the time is spend on creating the indexes. Index creation seems to
> > be completely CPU bound.
> > The problem is that only 1 CPU core is used. My other 7 cores are just
> > sitting there doing nothing. It seems to me that creating each index,
> > especially for different tables, is something that can be done
> > independently.
> > Is there some way I can let PostgreSQL use multiple cores for creating the
> > indexes?
> 
> Andrew Dunstan has been working on this problem.  His latest parallel
> restore patch can be found here:
> 
> http://archives.postgresql.org/message-id/4977e070.6070...@dunslane.net


Yeah but that isn't useful for 8.3. What can be done in this specific
situation is to make sure you dump with the -Fc option. You can then
pull a TOC out with pg_restore and break that appart. Reading the TOC is
pretty self evident. Once you get down to index creation you can create
multiple files each with a group of indexes to create. Then call
pg_restore multiple times in a script against the individual TOC and you
will use all cores.

Joshua D. Drake

P.S. Increase maintenance_work_mem can help too


> 
> ...Robert
> 
-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sort performance

2009-01-29 Thread Robert Haas
On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84
 wrote:
> I'm in the process of tuning a query that does a sort on a huge dataset.
> With work_mem set to 2M, i see the sort operation spilling to disk writing
> upto 430MB and then return the first 500 rows. Our query is of the sort
>
> select co1, col2... from table where col1 like 'aa%' order col1 limit 500;
> It took 561Secs to complete. Looking at the execution plan 95% of the time
> is spent on sort vs seq scan on the table.
>
> Now if set the work_mem to 500MB (i did this in a psql session without
> making it global) and ran the same query. One would think the sort
> operations would happen in memory and not spill to disk but i still see
> 430MB written to disk however, the query complete time dropped down to
> 351Secs. So work_mem did have an impact but wondering why its still writing
> to disk when it can all do it memory.
>
> I appreciate if anyone can shed some light on this.

Can you send the EXPLAIN ANALYZE output?

What happens if you set work_mem to something REALLY big, like 5GB?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sort performance

2009-01-29 Thread Gregory Stark
Robert Haas  writes:

> On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84
>  wrote:
>>
>> i see the sort operation spilling to disk writing upto 430MB and then
>> return the first 500 rows. Our query is of the sort
>>
>> Now if set the work_mem to 500MB (i did this in a psql session without
>> making it global) and ran the same query. One would think the sort
>> operations would happen in memory and not spill to disk but i still see
>> 430MB written to disk however, the query complete time dropped down to
>> 351Secs. So work_mem did have an impact but wondering why its still writing
>> to disk when it can all do it memory.

The on-disk storage is more compact than the in-memory storage so you actually
need a larger value than the space reported for on-disk storage to avoid the
disk sort entirely. The accounting also isn't perfect; the on-disk sort still
uses some ram, for example.

> What happens if you set work_mem to something REALLY big, like 5GB?

Don't set it larger than the available RAM though -- or you'll quite possibly
get an out-of-error error.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sort performance

2009-01-29 Thread Subbiah Stalin-XCGF84
Here you go.

 Limit  (cost=502843.44..502844.69 rows=501 width=618) (actual
time=561397.940..561429.242 rows=501 loops=1)
   ->  Sort  (cost=502843.44..503923.48 rows=432014 width=618) (actual
time=561397.934..561429.062 rows=501 loops=1)
 Sort Key: name
 ->  Seq Scan on objects  (cost=0.00..99157.88 rows=432014
width=618) (actual time=0.172..22267.727 rows=64 loops=1)
   Filter: (((domainid)::text = ANY
(('{111,SmWCGiRp}'::character varying[])::text[])) AND (("type")::text =
'cpe'::text) AND (upper((name)::text) ~~ 'CPE1%'::text) AND
(upper((name)::text) >= 'CPE1'::text) AND (upper((name)::text) <
'CPE2'::text))
 Total runtime: 561429.915 ms
(6 rows)

I haven't tried setting that high number. I came up with 500M by
monitoring pgsql_tmp when sort operations were performed. It never went
beyond 450M. Once it reaches 450M it spends some cycles before I see the
output. I guess some sort of merge operation happens to get the first
500 records out.

Thanks,
Stalin 

-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Thursday, January 29, 2009 3:21 PM
To: Subbiah Stalin-XCGF84
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sort performance

On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84
 wrote:
> I'm in the process of tuning a query that does a sort on a huge
dataset.
> With work_mem set to 2M, i see the sort operation spilling to disk 
> writing upto 430MB and then return the first 500 rows. Our query is of

> the sort
>
> select co1, col2... from table where col1 like 'aa%' order col1 limit 
> 500; It took 561Secs to complete. Looking at the execution plan 95% of

> the time is spent on sort vs seq scan on the table.
>
> Now if set the work_mem to 500MB (i did this in a psql session without

> making it global) and ran the same query. One would think the sort 
> operations would happen in memory and not spill to disk but i still 
> see 430MB written to disk however, the query complete time dropped 
> down to 351Secs. So work_mem did have an impact but wondering why its 
> still writing to disk when it can all do it memory.
>
> I appreciate if anyone can shed some light on this.

Can you send the EXPLAIN ANALYZE output?

What happens if you set work_mem to something REALLY big, like 5GB?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sort performance

2009-01-29 Thread Subbiah Stalin-XCGF84
Thanks Greg. You were right. If I set my sort_mem to 1G (yes I have
loads of memory, only for testing purpose), then I don't see any thing
written to disk. So in-memory require more memory than reported on-disk
storage.

Stalin  

-Original Message-
From: Greg Stark [mailto:greg.st...@enterprisedb.com] On Behalf Of
Gregory Stark
Sent: Thursday, January 29, 2009 3:36 PM
To: Robert Haas
Cc: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org
Subject: Re: Sort performance

Robert Haas  writes:

> On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84 
>  wrote:
>>
>> i see the sort operation spilling to disk writing upto 430MB and then

>> return the first 500 rows. Our query is of the sort
>>
>> Now if set the work_mem to 500MB (i did this in a psql session 
>> without making it global) and ran the same query. One would think the

>> sort operations would happen in memory and not spill to disk but i 
>> still see 430MB written to disk however, the query complete time 
>> dropped down to 351Secs. So work_mem did have an impact but wondering

>> why its still writing to disk when it can all do it memory.

The on-disk storage is more compact than the in-memory storage so you
actually need a larger value than the space reported for on-disk storage
to avoid the disk sort entirely. The accounting also isn't perfect; the
on-disk sort still uses some ram, for example.

> What happens if you set work_mem to something REALLY big, like 5GB?

Don't set it larger than the available RAM though -- or you'll quite
possibly get an out-of-error error.


--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using multiple cores for index creation?

2009-01-29 Thread henk de wit

Hi,> You can then> pull a TOC out with pg_restore and break that appart. 
Reading the TOC is> pretty self evident. Once you get down to index creation 
you can create> multiple files each with a group of indexes to create. Then 
call> pg_restore multiple times in a script against the individual TOC and you> 
will use all cores.I figured something like this would be possible. Thanks for 
the explanation. Ultimately I'm looking for something more automatic though. 
Not that I personally mind doing the above, but when an emergency restore is 
needed for some production server manually editing a dump is probably not the 
safest course of action ;)It sounds like something though that a tool could do 
automatically. The mentioned patch sounds interesting too, is there anything 
known about whether this patch will make it into the main stream Postgres 
source? I guess it's too late for inclusion in PostgreSQL 8.4, but 8.5 
perhaps?Kind regards
_
What can you do with the new Windows Live? Find out
http://www.microsoft.com/windows/windowslive/default.aspx

Re: [PERFORM] Using multiple cores for index creation?

2009-01-29 Thread Tom Lane
"Joshua D. Drake"  writes:
> On Thu, 2009-01-29 at 18:09 -0500, Robert Haas wrote:
>> Andrew Dunstan has been working on this problem.  His latest parallel
>> restore patch can be found here:
>> 
>> http://archives.postgresql.org/message-id/4977e070.6070...@dunslane.net

> Yeah but that isn't useful for 8.3.

Sure it is.  Andrew has made a point of making sure that the improved
version of pg_restore can work against older servers (not sure how far
back, but it's definitely supposed to work with 8.3).

> What can be done in this specific
> situation is to make sure you dump with the -Fc option...

You're essentially proposing a manual reimplementation of Andrew's
patch ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG performance in high volume environment (many INSERTs and lots of aggregation reporting)

2009-01-29 Thread Scott Marlowe
On Thu, Jan 29, 2009 at 1:56 PM, Chris Browne  wrote:
>
> It is common for systems where it is necessary for aggregation
> reporting to be fast to do pre-computation of the aggregates, and that
> is in no way specific to PostgreSQL.
>
> If you need *really* fast aggregates, then it will be worthwhile to
> put together triggers or procedures or something of the sort to help
> pre-compute the aggregates.

Just to add to this, at me last employer in Chicago, we had a database
from a very large database company who's CEO makes more than all the
people on this mailing list combined that shall not be named for
reasons like I don't want to be sued.  This database had a large
statistical dataset we replicated over to pgsql on a by the minute
basis so we could run big ugly queries anytime we felt like it without
blowing out the production database.

At night, or by hand, I would run such queries as select count(*) from
reallyreallyreallybigstatstable on it and compare it to postgresql.
PostgreSQL would take about 4 or 5 minutes to run this on a local
server running a software RAID-10 4 disc set on a single core P-4 Dell
workstation, and the really really big server in production took about
15 to 20 seconds.

Our local test server that ran the same really big database that
cannot be named and had a 16 disk RAID-6 array with gigs of memory and
4 cpu cores, took about 45 seconds to a minute to run the same select
count(*) query.

All of the machines showed high CPU and moderate I/O usage while
running said query.

So, there's probably some room for improvement in pgsql's way of doing
things, but it's not like the other database software was providing
instantaneous answers.  Basically, the second that a database server
becomes fast at running lots of update / select queries in a mixed
environment, things like fast select count(*) get slower.

To the OP: Try running 100 transactional clients against mysql
(updates/inserts/deletes/selects) while running a select count(*) and
see how it behaves.  Single thread use cases are kind of uninteresting
compared to lotsa users.  But if single thread use cases are your
bread and butter, then pgsql is possibly a poor choice of db.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance