Re: [PERFORM] Partitioned Tables and ORDER BY

2009-10-19 Thread Grzegorz Jaśkiewicz
On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski wrote:

> We have similar problem and now we are try to find solution. When you
> execute query on partion there is no sorting - DB use index to
> retrieve data and if you need let say 50 rows it reads 50 rows using
> index. But when you execute on parent table query optymizer do this:
>
>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
> (actual time=149864.868..149864.876 rows=50 loops=1)
>
> it means 8544855 rows should be sorted and it takes long minutes.

The figures in first parenthesis are estimates, not the actual row count.
If you think it is too low, increase statistic target for that column.

We
> have simpler situation than you and I will try to find solution
> tommorow :)
>
> Michal Szymanski
> http://blog.szymanskich.net
> http://techblog.freeconet.pl/
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
GJ


Re: [PERFORM] Issues with \copy from file

2009-10-19 Thread Matthew Wakeling

On Sun, 18 Oct 2009, Scott Marlowe wrote:

You can only write data then commit it so fast to one drive, and that
speed is usually somewhere in the megabyte per second range.  450+150
in 5 minutes is 120 Megs per second, that's pretty fast, but is likely
the max speed of a modern super fast 15k rpm drive.  If it's taking 20
minutes then it's 30 Megs per second which is still really good if
you're in the middle of a busy afternoon and the db has other things
to do.


You're out by a factor of 60. That's minutes, not seconds.

More relevant is the fact that Postgres will normally log changes in the 
WAL, effectively writing the data twice. As Euler said, the trick is to 
tell Postgres that noone else will need to see the data, so it can skip 
the WAL step:



BEGIN;
TRUNCATE TABLE foo;
COPY foo FROM ...;
COMMIT;


I see upward of 100MB/s over here when I do this.

Matthew

--
Patron: "I am looking for a globe of the earth."
Librarian: "We have a table-top model over here."
Patron: "No, that's not good enough. Don't you have a life-size?"
Librarian: (pause) "Yes, but it's in use right now."

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance with sorting and LIMIT on partitioned table

2009-10-19 Thread Joe Uhl



On Mon, Oct 12, 2009 at 10:14 AM, Michal Szymanski  wrote:
   

We have performance problem with query on partitioned table when query
use order by and we want to use first/last rows from result set.
More detail description:
We have big table where each row is one telephone call (CDR).
Definitnion of this table look like this:
CREATE TABLE accounting.cdr_full_partitioned  (it is parrent table)
(
  cdr_id bigint NOT NULL,
  id_crx_group_from bigint, -- 
identifier of user
  start_time_invite timestamp with time zone,   -- start call time
  call_status VARCHAR   -- 
FINF-call finished, FINC-call
unfinished
  ..some extra data..
)

We creating 12 partitions using 'start_time_invite' column, simply we
create one partition for each month. We create costraints like this:
ALTER TABLE accounting.cdr_y2009_m09
  ADD CONSTRAINT y2009m09 CHECK (start_time_invite>= '2009-09-01
00:00:00+02'::timestamp with time zone AND start_time_invite<
'2009-10-01 00:00:00+02'::timestamp with time zone);

and we define necessery indexes of course

CREATE INDEX cdr_full_partitioned_y2009_m09_id_crx_group_to_key1
  ON accounting.cdr_full_partitioned_y2009_m09
  USING btree
  (id_crx_group_from, start_time_invite, call_status);


The problem appears when we want to select calls for specified user
with specified call_Status e.g:
  SELECT * FROM accounting.cdr_full_partitioned
   WHERE
   id_crx_group_from='522921' AND
   call_status='FINS' AND
   start_time_invite>='2009-09-28 00:00:00+02' AND
   start_time_invite<'2009-10-12 23:59:59+02'   AND
  ORDER BY start_time_invite  LIMIT '100' OFFSET 0

you can see execution plan  http://szymanskich.net/pub/postgres/full.jpg
  as you see 2 rows were selected and after were sorted what take
very long about 30-40s and after sorting it limit
result to 100 rows.

Using table without partition

  SELECT * FROM accounting.cdr_fullWHERE
(id_crx_group_from='522921') AND (
   call_status='FINS' ) AND (start_time_invite>='2009-01-28
00:00:00+02')
   AND (start_time_invite<'2009-10-12 23:59:59+02') ORDER BY
start_time_invite  LIMIT '100' OFFSET 0

execution plan is very simple
"Limit  (cost=0.00..406.40 rows=100 width=456)"
"  ->Index Scan using
cdr_full_crx_group_from_start_time_invite_status_ind on cdr_full
(cost=0.00..18275.76 rows=4497 width=456)"
"Index Cond: ((id_crx_group_from = 522921::bigint) AND
(start_time_invite>= '2009-01-27 23:00:00+01'::timestamp with time
zone) AND (start_time_invite<  '2009-10-12 23:59:59+02'::timestamp
with time zone) AND ((call_status)::text = 'FINS'::text))"

it use index to fetch first 100 rows and it is super fast and take
less than 0.5s. There is no rows sorting!
I've tried to execute the same query on one partition:
  SELECT * FROM accounting.cdr_full_partitioned_y2009_m09
  WHERE (id_crx_group_from='509498') AND (
   call_status='FINS' ) AND (start_time_invite>='2009-09-01
00:00:00+02')
   AND (start_time_invite<'2009-10-12 23:59:59+02')

You can see execution plan http://szymanskich.net/pub/postgres/ononeprtition.jpg
and query is superfast because there is no sorting. The question is
how to speed up query when we use partitioning? So far I have not
found solution. I'm wonder how do you solve problems
when result from partition must be sorted and after we want to display
only first/last 100 rows?
We can use own partitioning mechanism and partitioning data using
id_crx_group_from and create dynamic query (depending on
id_crx_group_from we can execute query on one partition) but it is not
most beautiful solution.
 


Yeah - unfortunately the query planner is not real smart about
partitioned tables yet.  I can't make anything of the JPG link you
posted.  Can you post the EXPLAIN ANALYZE output for the case that is
slow?  What PG version is this?

...Robert

   
I have a similar, recent thread titled Partitioned Tables and ORDER BY 
with a decent break down.  I think I am hitting the same issue Michal is.


Essentially doing a SELECT against the parent with appropriate 
constraint columns in the WHERE clause is very fast (uses index scans 
against correct child table only) but the moment you add an ORDER BY it 
seems to be merging the parent (an empty table) and the child, sorting 
the results, and sequential scanning.  So it does still scan only the 
appropriate child table in the end but indexes are useless.


Unfortunately the only workaround I can come up with is to query the 
partitioned child tables directly.  In my case the partitions are rather 
large so the timing difference is 522ms versus 149865ms.




[PERFORM] Known Bottlenecks

2009-10-19 Thread Vikul Khosla
Jeff, Robert, I am still working on the "low cardinality" info you requested. 
Please bear with me.

In the meantime, have the following question:

Are there known "scenarios" where certain types of SQL queries perform worse in 
PG
than they do in ORacle ?

For example, I have observed some discussion where MAX (In Oracle) was replaced 
with ORDER/DESC/LIMIT
in PG.

I realize this is a loaded question, but it would be great if any of you would 
share some observed
generalities in this context. 

Thanks
VK


Re: [PERFORM] Known Bottlenecks

2009-10-19 Thread Grzegorz Jaśkiewicz
On Mon, Oct 19, 2009 at 2:43 PM, Vikul Khosla  wrote:

> Jeff, Robert, I am still working on the "low cardinality" info you
> requested. Please bear with me.
>
> In the meantime, have the following question:
>
> Are there known "scenarios" where certain types of SQL queries perform
> worse in PG
> than they do in ORacle ?
>
> For example, I have observed some discussion where MAX (In Oracle) was
> replaced with ORDER/DESC/LIMIT
> in PG.
>
> I realize this is a loaded question, but it would be great if any of you
> would share some observed
> generalities in this context.
>

other one would be SELECT .. WHERE foo IN (SELECT ...);   (use join instead,
and in case of NOT IN , use left join).



-- 
GJ


Re: [PERFORM] Indexes on low cardinality columns

2009-10-19 Thread Ron Mayer
If the table can be clustered on that column, I suspect
it'd be a nice case for the grouped index tuples patch
http://community.enterprisedb.com/git/

Actually, simply clustering on that column might give
major speedups anyway.

Vikul Khosla wrote:
> Folks,
> 
> We have just migrated from Oracle to PG.
> 
> We have a database that has approx 3 mil rows and one of the columns has
> a cardinality
> of only 0.1% (3000 unique values).
> 
> We have to issue several queries that use this low cardinality column in
> a WHERE clause
> as well as see this column participating in JOINS (ouch!).
> 
> A regular B-Tree index has been created on these columns.
> 
> In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw
> performance go
> through the roof. I know Postgres does not have Bitmap indexes,
> but is there a reasonable alternative to boost performance in situations
> where low cardinality
> columns are involved ?
> 
> I dont have the option of changing schemas - so please dont go there :)
> 
> TIA,
> VK


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Tom Lane
I wrote:
> Just for fun, I hacked together a first cut at this.

Oh, just for the archives: I forgot about not suppressing volatile
expressions --- checking that would increase the cost of this
significantly, though it's only another line or two.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioned Tables and ORDER BY

2009-10-19 Thread Robert Haas
2009/10/19 Grzegorz Jaśkiewicz :
>
>
> On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski 
> wrote:
>>
>> We have similar problem and now we are try to find solution. When you
>> execute query on partion there is no sorting - DB use index to
>> retrieve data and if you need let say 50 rows it reads 50 rows using
>> index. But when you execute on parent table query optymizer do this:
>>
>>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
>> (actual time=149864.868..149864.876 rows=50 loops=1)
>>
>> it means 8544855 rows should be sorted and it takes long minutes.
>
> The figures in first parenthesis are estimates, not the actual row count.
> If you think it is too low, increase statistic target for that column.

It's true that the figures in parentheses are estimates, it's usually
bad when the estimated and actual row counts are different by 5 orders
of magnitude, and that large of a difference is not usually fixed by
increasing the statistics target.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioned Tables and ORDER BY

2009-10-19 Thread Grzegorz Jaśkiewicz
2009/10/19 Robert Haas 

> 2009/10/19 Grzegorz Jaśkiewicz :
> >
> >
> > On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski 
> > wrote:
> >>
> >> We have similar problem and now we are try to find solution. When you
> >> execute query on partion there is no sorting - DB use index to
> >> retrieve data and if you need let say 50 rows it reads 50 rows using
> >> index. But when you execute on parent table query optymizer do this:
> >>
> >>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
> >> (actual time=149864.868..149864.876 rows=50 loops=1)
> >>
> >> it means 8544855 rows should be sorted and it takes long minutes.
> >
> > The figures in first parenthesis are estimates, not the actual row count.
> > If you think it is too low, increase statistic target for that column.
>
> It's true that the figures in parentheses are estimates, it's usually
> bad when the estimated and actual row counts are different by 5 orders
> of magnitude, and that large of a difference is not usually fixed by
> increasing the statistics target.
>
> I thought that this means, that either analyze was running quite a long
time ago, or that the value didn't made it to histogram. In the later case,
that's mostly case when your statistic target is low, or that the value is
really 'rare'.



-- 
GJ


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Sat, 2009-10-17 at 21:41 -0400, Tom Lane wrote:

> one thing we'd have to consider
> is whether it is okay to suppress calculation of columns containing
> volatile functions.

I think we should have a 4th class of functions,
volatile-without-side-effects (better name needed, obviously).

That would allow us to optimize such calls away, if appropriate.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Tom Lane
Simon Riggs  writes:
> On Sat, 2009-10-17 at 21:41 -0400, Tom Lane wrote:
>> one thing we'd have to consider
>> is whether it is okay to suppress calculation of columns containing
>> volatile functions.

> I think we should have a 4th class of functions,
> volatile-without-side-effects (better name needed, obviously).

What for?  There wouldn't be that many, I think.  random() and
clock_timestamp(), yeah, but most volatile user-defined functions
are either volatile-with-side-effects or misdeclared.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioned Tables and ORDER BY

2009-10-19 Thread Craig James

Joe Uhl wrote:
This seems like a pretty major weakness in PostgreSQL partitioning.  I 
have essentially settled on not being able to do queries against the 
parent table when I want to order the results.  Going to have to use a 
Hibernate interceptor or something similar to rewrite the statements so 
they hit specific partitions, will be working on this in the coming week.


This weakness is a bummer though as it makes partitions a lot less 
useful.  Having to hit specific child tables by name isn't much 
different than just creating separate tables and not using partitions at 
all.


I wonder if the "offset 0" trick would work here?  I was told (for a different 
question) that the planner can't merge levels if there's an offset or limit on a 
subquery.  So you might be able to do something like this:

 select ... from (select ...  offset 0) as foo order by ...

In other words, put your primary query as a sub-select without the sort criterion, with 
the "offset 0" as a sort of roadblock that the planner can't get past.  Then 
the outer select does the sorting, without affecting the plan for the inner select.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Kevin Grittner
Simon Riggs  wrote:
 
> I think we should have a 4th class of functions,
> volatile-without-side-effects 
 
Sounds reasonable to me.
 
> (better name needed, obviously).
 
Well, from this list (which is where volatile points), mutable seems
closest to OK, but I'm not sure I like any of them.
 
http://www.merriam-webster.com/thesaurus/fickle
 
Anyone else have an idea?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Mon, 2009-10-19 at 13:43 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Sat, 2009-10-17 at 21:41 -0400, Tom Lane wrote:
> >> one thing we'd have to consider
> >> is whether it is okay to suppress calculation of columns containing
> >> volatile functions.
> 
> > I think we should have a 4th class of functions,
> > volatile-without-side-effects (better name needed, obviously).
> 
> What for?  There wouldn't be that many, I think.  random() and
> clock_timestamp(), yeah, but most volatile user-defined functions
> are either volatile-with-side-effects or misdeclared.

Read only vs. read write?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Gerhard Wiesinger

On Sun, 18 Oct 2009, Tom Lane wrote:


Robert Haas  writes:

On Sun, Oct 18, 2009 at 1:59 PM, Jeff Janes  wrote:

Even if country.id is a primary or unique key?



Well, we currently don't have any logic for making inferences based on
unique constraints.


Huh?
http://archives.postgresql.org/pgsql-committers/2009-09/msg00159.php

Admittedly it's just one case and there's lots more to be done, but it's
more than nothing.  So this is a *potential* argument for trying to trim
subquery outputs.  What I'm not sure about is whether there are common
cases where this would be applicable below a non-flattenable subquery.



Wow. That's IHMO a major improvement in the optimizer. Is this also valid 
for views?


In the area of views this might even be a killer feature since one can 
define a view with many columns and when only e.g. one is used query is 
still optimal. I had today such a situation where I created a new view to 
be ~24 times faster (with a lot of left outer joins).


Is the patch only for 8.5 or even backported to 8.4 and 8.3?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Tom Lane
Simon Riggs  writes:
> On Mon, 2009-10-19 at 13:43 -0400, Tom Lane wrote:
>> Simon Riggs  writes:
>>> I think we should have a 4th class of functions,
>>> volatile-without-side-effects (better name needed, obviously).
>> 
>> What for?  There wouldn't be that many, I think.  random() and
>> clock_timestamp(), yeah, but most volatile user-defined functions
>> are either volatile-with-side-effects or misdeclared.

> Read only vs. read write?

Most read-only functions are stable or even immutable.  I don't say
that there's zero usefulness in a fourth class, but I do say it's
unlikely to be worth the trouble.  (The only reason it even came
up in this connection is that the default for user-defined functions
is "volatile" which would defeat this optimization ... but we could
hardly make the default anything else.)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Tom Lane
Gerhard Wiesinger  writes:
> Is the patch only for 8.5 or even backported to 8.4 and 8.3?

That patch will *not* be backported.  It hasn't even got through beta yet.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Mon, 2009-10-19 at 13:58 -0400, Tom Lane wrote:
> 
> Most read-only functions are stable or even immutable.

Huh? I mean a function that only contains SELECTs. (How would those ever
be Stable or Immutable??)

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Tom Lane
Simon Riggs  writes:
> On Mon, 2009-10-19 at 13:58 -0400, Tom Lane wrote:
>> Most read-only functions are stable or even immutable.

> Huh? I mean a function that only contains SELECTs. (How would those ever
> be Stable or Immutable??)

Uh, a function containing SELECTs is exactly the use-case for STABLE.
Maybe you need to go re-read the definitions?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioned Tables and ORDER BY

2009-10-19 Thread Robert Haas
2009/10/19 Grzegorz Jaśkiewicz :
>
>
> 2009/10/19 Robert Haas 
>>
>> 2009/10/19 Grzegorz Jaśkiewicz :
>> >
>> >
>> > On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski 
>> > wrote:
>> >>
>> >> We have similar problem and now we are try to find solution. When you
>> >> execute query on partion there is no sorting - DB use index to
>> >> retrieve data and if you need let say 50 rows it reads 50 rows using
>> >> index. But when you execute on parent table query optymizer do this:
>> >>
>> >>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
>> >> (actual time=149864.868..149864.876 rows=50 loops=1)
>> >>
>> >> it means 8544855 rows should be sorted and it takes long minutes.
>> >
>> > The figures in first parenthesis are estimates, not the actual row
>> > count.
>> > If you think it is too low, increase statistic target for that column.
>>
>> It's true that the figures in parentheses are estimates, it's usually
>> bad when the estimated and actual row counts are different by 5 orders
>> of magnitude, and that large of a difference is not usually fixed by
>> increasing the statistics target.
>>
> I thought that this means, that either analyze was running quite a long time
> ago, or that the value didn't made it to histogram. In the later case,
> that's mostly case when your statistic target is low, or that the value is
> really 'rare'.

It's possible, but (1) most people are running autovacuum these days,
in which case this isn't likely to occur and (2) most people do not
manage to expand the size of a table by five orders of magnitude
without analyzing it.  Generally these kinds of problems come from bad
selectivity estimates.

In this case, though, I think that the actual number is less than the
estimate because of the limit node immediately above.  The problem is
just that a top-N heapsort requires scanning the entire set of rows,
and scanning 8 million rows is slow.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] maintain_cluster_order_v5.patch

2009-10-19 Thread ph...@apra.asso.fr
Hi all,

The current discussion about "Indexes on low cardinality columns" let me 
discover this 
"grouped index tuples" patch (http://community.enterprisedb.com/git/) and its 
associated 
"maintain cluster order" patch 
(http://community.enterprisedb.com/git/maintain_cluster_order_v5.patch)

This last patch seems to cover the TODO item named "Automatically maintain 
clustering on a table". 
As this patch is not so new (2007), I would like to know why it has not been 
yet integrated in a standart version of PG (not well finalized ? not totaly 
sure ? not corresponding to the way the core team would like to address this 
item ?) and if there are good chance to see it committed in a near future.

I currently work for a large customer who is migrating a lot of databases used 
by an application that currently largely takes benefit from well clustered 
tables, especialy for batch processing. The migration brings a lot of benefits. 
In fact, the only regression, compared to the old RDBMS, is the fact that 
tables organisation level decreases more quickly, generating more frequent 
heavy cluster operations. 

So this "maintain cluster order" patch (and may be "git" also) should fill the 
lack. But leaving the way of the "standart PG" is not something very 
attractive...

Regards. 
Philippe Beaudoin.





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintain_cluster_order_v5.patch

2009-10-19 Thread Jeff Davis
On Mon, 2009-10-19 at 21:32 +0200, ph...@apra.asso.fr wrote:
> Hi all,
> 
> The current discussion about "Indexes on low cardinality columns" let
> me discover this 
> "grouped index tuples" patch (http://community.enterprisedb.com/git/)
> and its associated 
> "maintain cluster order" patch
> (http://community.enterprisedb.com/git/maintain_cluster_order_v5.patch)
> 
> This last patch seems to cover the TODO item named "Automatically
> maintain clustering on a table".

The TODO item isn't clear about whether the order should be strictly
maintained, or whether it should just make an effort to keep the table
mostly clustered. The patch mentioned above makes an effort, but does
not guarantee cluster order.

> As this patch is not so new (2007), I would like to know why it has
> not been yet integrated in a standart version of PG (not well
> finalized ? not totaly sure ? not corresponding to the way the core
> team would like to address this item ?) and if there are good chance
> to see it committed in a near future.

Search the archives on -hackers for discussion. I don't think either of
these features were rejected, but some of the work and benchmarking have
not been completed.

If you can help (either benchmark work or C coding), try reviving the
features by testing them and merging them with the current tree. I
recommend reading the discussion first, to see if there are any major
problems.

Personally, I'd like to see the GIT feature finished as well. When I
have time, I was planning to take a look into it.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance with sorting and LIMIT on partitioned table

2009-10-19 Thread Merlin Moncure
On Mon, Oct 19, 2009 at 6:58 AM, Joe Uhl  wrote:
> I have a similar, recent thread titled Partitioned Tables and ORDER BY with
> a decent break down.  I think I am hitting the same issue Michal is.
>
> Essentially doing a SELECT against the parent with appropriate constraint
> columns in the WHERE clause is very fast (uses index scans against correct
> child table only) but the moment you add an ORDER BY it seems to be merging
> the parent (an empty table) and the child, sorting the results, and
> sequential scanning.  So it does still scan only the appropriate child table
> in the end but indexes are useless.
>
> Unfortunately the only workaround I can come up with is to query the
> partitioned child tables directly.  In my case the partitions are rather
> large so the timing difference is 522ms versus 149865ms.

These questions are all solvable depending on what you define
'solution' as.  I would at this point be thinking in terms of wrapping
the query in a function using dynamic sql in plpgsql...using some ad
hoc method of determining which children to hit and awkwardly looping
them and enforcing limit, ordering, etc at that level.  Yes, it sucks,
but it only has to be done for classes of queries constraint exclusion
can't handle and you will only handle a couple of cases most likely.

For this reason, when I set up my partitioning strategies, I always
try to divide the data such that you rarely if ever, have to fire
queries that have to touch multiple partitions simultaneously.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance