[PERFORM] Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

2007-12-05 Thread Robert Bernabe
Hi All,
I've been tasked to evaluate PG as a possible replacement of our MS SQL 
2000 solution. Our solution is 100% stored procedure/function centric. It's a 
report generation system whose sole task is to produce text files filled with 
processed data that is post-processed by a secondary system. Basically options 
are selected via a web interface and all these parameters are passed unto the 
stored procedure and then the stored procedure would run and in the process 
call other stored procedures until eventually a single formatted text file is 
produced. 
I decided on Fedora Core 7 and the 8.3 Beta release of Enterprise DB 
PostgreSQL. I decided to port 1 stored procedure plus it's several support 
stored procedures into pl/pgsql from T-SQL and compare the performance by 
measuring how long each system takes to produce the text file. For this test,  
the output to the text file was discarded and the stored procedure/function 
would end once the final temporary table is filled with the information that is 
eventually dumped into the text file. 

Windows 2000 Professional + MSDE (/MS SQL) Boxvs.   FC7 + EnterpriseDB PG 
Box

Note that both boxes have EXACTLY the same hardware (not VMWARE or anything) 
AMD X2 3800
2 G RAM DDR 400
80 G Barracuda Sata

The data was copied to the Linux box and checked lightly for consistency versus 
the windows box (number of tables / columns and records) and they all match. 
After data transfer to the Linux Box, I ran REINDEX and ANALYZE. 

For the actual run the following tables were used and I'm displaying the 
results of analyze.

INFO:  analyzing "public.AreaDefn"
INFO:  "AreaDefn": scanned 15 of 15 pages, containing 2293 live rows and 0 dead 
rows; 2293 rows in sample, 2293 estimated total rows
INFO:  analyzing "public.AreaDefn2"
INFO:  "AreaDefn2": scanned 30 of 30 pages, containing 3439 live rows and 0 
dead rows; 3000 rows in sample, 3439 estimated total rows
INFO:  analyzing "public.Areas"
INFO:  "Areas": scanned 2 of 2 pages, containing 164 live rows and 0 dead rows; 
164 rows in sample, 164 estimated total rows
INFO:  analyzing "public.Brands"
INFO:  "Brands": scanned 1 of 1 pages, containing 11 live rows and 0 dead rows; 
11 rows in sample, 11 estimated total rows
INFO:  analyzing "public.Categories"
INFO:  "Categories": scanned 1 of 1 pages, containing 26 live rows and 0 dead 
rows; 26 rows in sample, 26 estimated total rows
INFO:  analyzing "public.CategoryDefn"
INFO:  "CategoryDefn": scanned 1 of 1 pages, containing 133 live rows and 0 
dead rows; 133 rows in sample, 133 estimated total rows
INFO:  analyzing "public.CategoryDefn2"
INFO:  "CategoryDefn2": scanned 2 of 2 pages, containing 211 live rows and 0 
dead rows; 211 rows in sample, 211 estimated total rows
INFO:  analyzing "public.CategorySets"
INFO:  "CategorySets": scanned 1 of 1 pages, containing 3 live rows and 0 dead 
rows; 3 rows in sample, 3 estimated total rows
INFO:  analyzing "public.CATemplateGroup"
INFO:  analyzing "public.Channels"
INFO:  "Channels": scanned 1 of 1 pages, containing 7 live rows and 0 dead 
rows; 7 rows in sample, 7 estimated total rows
INFO:  analyzing "public.ClientCodes"
INFO:  analyzing "public.Clients"
INFO:  "Clients": scanned 7 of 7 pages, containing 366 live rows and 0 dead 
rows; 366 rows in sample, 366 estimated total rows
INFO:  analyzing "public.Customers"
INFO:  "Customers": scanned 2 of 2 pages, containing 129 live rows and 0 dead 
rows; 129 rows in sample, 129 estimated total rows
NFO:  analyzing "public.Databases"
INFO:  "Databases": scanned 1 of 1 pages, containing 1 live rows and 0 dead 
rows; 1 rows in sample, 1 estimated total rows
INFO:  analyzing "public.DataSources"
INFO:  "DataSources": scanned 1 of 1 pages, containing 8 live rows and 0 dead 
rows; 8 rows in sample, 8 estimated total rows
INFO:  analyzing "public.DateToWeekConversion"
INFO:  "DateToWeekConversion": scanned 4 of 4 pages, containing 371 live rows 
and 0 dead rows; 371 rows in sample, 371 estimated total rows
INFO:  analyzing "public.Divisions"
INFO:  "Divisions": scanned 1 of 1 pages, containing 1 live rows and 0 dead 
rows; 1 rows in sample, 1 estimated total rows
INFO:  analyzing "public.MetricTable"
INFO:  "MetricTable": scanned 1 of 1 pages, containing 48 live rows and 0 dead 
rows; 48 rows in sample, 48 estimated total rows
INFO:  analyzing "public.Offtake"
INFO:  "Offtake": scanned 3000 of 13824 pages, containing 141000 live rows and 
0 dead rows; 3000 rows in sample, 649728 estimated total rows
INFO:  analyzing "public.SKUs"
INFO:  "SKUs": scanned 3 of 3 pages, containing 73 live rows and 0 dead rows; 
73 rows in sample, 73 estimated total rows
INFO:  analyzing "public.SMSDefaults"
INFO:  "SMSDefaults": scanned 1 of 1 pages, containing 43 live rows and 0 dead 
rows; 43 rows in sample, 43 estimated total rows
INFO:  analyzing "public.StandardPeriods"
INFO:  "StandardPeriods": scanned 1 of 1 pages, containing 8 live rows and 0 
dead rows; 8 rows in sample, 8 estimated to

Re: [PERFORM] Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

2007-12-05 Thread Usama Dar
On Dec 5, 2007 1:13 PM, Robert Bernabe <[EMAIL PROTECTED]> wrote:

> Anyways I'd like to break up this request/begging for help into two parts.
>
> 1) First would be settings of postgresql.conf. Did I do it correctly? The
> sample data is so smallI based my settings on the recommendations
> researched for data centers.
>

i think this would  mainly depend on what do your stored procedures do, are
they writing stuff to the tables, or reading most of the time or something
else? i would imagine with small dataset the postgresql.conf settings should
be ok, but more can be told after looking at the code.


>
> 2) Code optimization which I plan to start in another email thread since
> the discussions there would be more detailed.
>

i think that might be a better starting point.  What are you trying to do
and how.


>
>
> Would it also make sense to optimize (as far as possible) everything
> (including the code) for windows first? The target here would be a linux OS
> but since the discrepancy is so big...the unified Windows OS might be a good
> place to start for now.
>

Sure, but i am not able to comprehend how the pl/pgsql could contain code
which can tuned OS wise, i would think that any optimization you would do
there in the stored code would apply to all platforms.




>
>
> Many Thanks in advance.
>
> Regards
>
>
>
>


-- 
Usama Munir Dar http://linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar


Re: [PERFORM] TB-sized databases

2007-12-05 Thread Robert Treat
On Thursday 29 November 2007 11:14, Simon Riggs wrote:
> On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote:
> > Gregory Stark <[EMAIL PROTECTED]> writes:
> > > "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > >> Tom's previous concerns were along the lines of "How would know what
> > >> to set it to?", given that the planner costs are mostly arbitrary
> > >> numbers.
> > >
> > > Hm, that's only kind of true.
> >
> > The units are not the problem.  The problem is that you are staking
> > non-failure of your application on the planner's estimates being
> > pretty well in line with reality.  Not merely in line enough that
> > it picks a reasonably cheap plan, but in line enough that if it
> > thinks plan A is 10x more expensive than plan B, then the actual
> > ratio is indeed somewhere near 10.
> >
> > Given that this list spends all day every day discussing cases where the
> > planner is wrong, I'd have to think that that's a bet I wouldn't take.
>
> I think you have a point, but the alternative is often much worse.
>
> If an SQL statement fails because of too high cost, we can investigate
> the problem and re-submit. If a website slows down because somebody
> allowed a very large query to execute then everybody is affected, not
> just the person who ran the bad query. Either way the guy that ran the
> query loses, but without constraints in place one guy can kill everybody
> else also.
>
> > You could probably avoid this risk by setting the cutoff at something
> > like 100 or 1000 times what you really want to tolerate, but how
> > useful is it then?
>
> Still fairly useful, as long as we understand its a blunt instrument.
>
> If the whole performance of your system depends upon indexed access then
> rogue queries can have disastrous, unpredictable consequences. Many
> sites construct their SQL dynamically, so a mistake in a seldom used
> code path can allow killer queries through. Even the best DBAs have been
> known to make mistakes.
>

If the whole performance of your system depends upon indexed access, then 
maybe you need a database that gives you a way to force index access at the 
query level? 

> e.g. An 80GB table has 8 million blocks in it.
> - So putting a statement_cost limit = 1 million would allow some fairly
> large queries but prevent anything that did a SeqScan (or worse).
> - Setting it 10 million is going to prevent things like sorting the
> whole table without a LIMIT
> - Setting it at 100 million is going to prevent unconstrained product
> joins etc..

I think you're completly overlooking the effect of disk latency has on query 
times.  We run queries all the time that can vary from 4 hours to 12 hours in 
time based solely on the amount of concurrent load on the system, even though 
they always plan with the same cost.  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

2007-12-05 Thread Bill Moran
In response to Robert Bernabe <[EMAIL PROTECTED]>:

> Hi All,
> I've been tasked to evaluate PG as a possible replacement of our
> MS SQL 2000 solution. Our solution is 100% stored procedure/function
> centric.

I've trimmed 99% of your email out, because it's not relevant to my
answer.

Fact is, it's pretty much impossible for anyone to give specific help
because you've obviously got a large, complex operation going on here,
and have not provided any real details.  The reality is that we'd
probably have to see your code to give any specific help.

However, I can help you with an approach to fixing it.  Based on your
description of the problem, I would guess that there are some differences
in best practices between MSSQL and PG that are what's hurting your
application once it's ported to PG.  Basically, you just need to isolate
them and adjust.

I recommend enabling full query logging with timing on the PG server.
In the postgresql.conf file, set the following:
log_min_duration_statement = 0

Note that this will result in a LOT of log information being written,
which will invariably make the application run even slower on PG, but
for tuning purposes it's invaluable as it will log every SQL statement
issued with the time it took to run.

>From there, look for the low-hanging fruit.  I recommend running your
tests a few times, then running the logs through pgFouine:
http://pgfouine.projects.postgresql.org/

Once you've identified the queries that are taking the most time, start
adjusting the queries and/or the DB schema to improve the timing.  In
my experience, you'll usually find 2 or 3 queries that are slowing the
thing down, and the performance will come up to spec once they're
rewritten (or appropriate indexes added, or whatever)  EXPLAIN can
be your friend once you've found problematic queries.

Another piece of broadly useful advice is to install the pgbuffercache
addon and monitor shared_buffer usage to see if you've got enough.  Also
useful is monitoring the various statistics in the pg_stat_database
table.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

2007-12-05 Thread Merlin Moncure
On Dec 5, 2007 3:13 AM, Robert Bernabe <[EMAIL PROTECTED]> wrote:
> Would it also make sense to optimize (as far as possible) everything
> (including the code) for windows first? The target here would be a linux OS
> but since the discrepancy is so big...the unified Windows OS might be a good
> place to start for now.

speaking in very general terms, postgresql should be competitive with
ms sql in this type of application.  there are a few things here and
there you have to watch out for...for example select count(*) from
table is slower on pg.  another common thing is certain query forms
that you have to watch out for...but these issues are often addressed
with small adjustments.

the key here is to isolate specific things in your procedure that are
underperforming and to determine the answer why.  to get the most
benefit from this list, try and post some particulars along with some
'explain analyze' results.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Optimizer Not using the Right plan

2007-12-05 Thread galy lee
I think you may increase the row number that you want to limit, like LIMIT
50.

LIMIT can change the cost of a plan dramatically. Looking in your SQL:

   where this_.fkaddressid= 6664161
   order by this_.addressvaluationid desc limit 1;

Planner may use either index1(this_.fkaddressid) or
index2(this_.addressvaluationid) to scan the table. Although it is obvious
that using index2 is very expensive, but because you are trying to limit one
row from 1304 row, so the cost of using index2 changes to

   883328/1304=677.69

The cost of using index1 should be lager than 1304, so planner chooses
index2.

Planner tends to choose a plan which has small startup cost when you are
trying to LIMIT a small portion of data over a large data set. It seems that
the following issue also comes from the same root.

  http://archives.postgresql.org/pgsql-performance/2007-11/msg00395.php
Best Regards
Galy Lee


> Tom Lane wrote:
>
> Pallav Kalva  writes:
>
>why does it have different plans for different values
>
> Because the values occur different numbers of times (or so it thinks
> anyway).  If the rowcount estimates are far from reality, perhaps
> increasing the statistics target would help.  However, since you
> only showed EXPLAIN and not EXPLAIN ANALYZE output, no one can
> really tell whether the optimizer did anything wrong here.
>
>   regards, tom lane
>
> Hi Tom,
>
>
> Thanks! for your reply, here is an another example of the same query with
> different addressid now. This time I got the explain analyze on the query,
> this query also uses the Index Scan Backwards, it says it took 28 seconds
> but I can say that after looking at the postgres logs it took more than 2
> min when the query first ran. I ran this one again to get the explain analyze.
>
>
>   The statistics set to "default_statistics_target = 100"
>
>
> I am sure if it uses index on addressid it would be quicker but for some
> reason it using index backward scan on addressvaluationid and that is
> taking too long.
>
> Not only this one there are some other queries which use index scan backwards
> scan and it takes too long. Index scan backwards most of the time is not
> doing good for me is there any way to avoid it ?
>
> explain analyze
>
> select this_.addressvaluationid as addressv1_150_1_, this_.sourcereference
> as sourcere2_150_1_, this_.createdate as createdate150_1_,
> this_.valuationdate as valuatio4_150_1_, this_.valuationamount as
> valuatio5_150_1_, this_.valuationhigh as valuatio6_150_1_,
> this_.valuationlow as valuatio7_150_1_, this_.valuationconfidence as
> valuatio8_150_1_, this_.valuationchange as valuatio9_150_1_, 
> this_.historycharturl
> as history10_150_1_, this_.regionhistorycharturl as regionh11_150_1_, 
> this_.fkaddressid
> as fkaddre12_150_1_, this_.fkaddressvaluationsourceid as fkaddre13_150_1_,
> this_.fkvaluationchangeperiodid as fkvalua14_150_1_, 
> valuationc2_.valuationchangeperiodid
> as valuatio1_197_0_,
>
>  valuationc2_.name as name197_0_
>
> from listing.addressvaluation this_ left outer join
> listing.valuationchangeperiod valuationc2_ on 
> this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid
>
>
> where this_.fkaddressid= 6664161
> order by this_.addressvaluationid desc limit 1;
>
> QUERY PLAN 
> --
> Limit (cost=0.00..677.69 rows=1 width=494) (actual time=
> 28454.708..28454.712 rows=1 loops=1) -> Nested Loop Left Join (cost=
> 0.00..883705.44 rows=1304 width=494) (actual time=28454.700..28454.700rows=1 
> loops=1) ->
> Index Scan Backward using pk_addressvaluation_addressvaluationid on
> addressvaluation this_ (cost=0.00..883328.22 rows=1304 width=482) (actual
> time=28441.236..28441.236 rows=1 loops=1)
>
>   Filter: (fkaddressid = 6664161)
>
> -> Index Scan using pk_valuationchangeperiod_valuationchangeperiodid on 
> valuationchangeperiod
> valuationc2_ (cost=0.00..0.28 rows=1 width=12) (actual 
> time=13.447..13.447rows=1 loops=1) Index
> Cond: (this_.fkvaluationchangeperiodid = valuationc2_.valuationchangeperiodid)
>
>
> Total runtime: 28454.789 ms
> (7 rows)
>
>
>


Re: [PERFORM] Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

2007-12-05 Thread Scott Marlowe
On Dec 5, 2007 2:13 AM, Robert Bernabe <[EMAIL PROTECTED]> wrote:
>
> Hi All,
> I've been tasked to evaluate PG as a possible replacement of our MS SQL
> 2000 solution. Our solution is 100% stored procedure/function centric. It's
> a report generation system whose sole task is to produce text files filled
> with processed data that is post-processed by a secondary system. Basically
> options are selected via a web interface and all these parameters are passed
> unto the stored procedure and then the stored procedure would run and in the
> process call other stored procedures until eventually a single formatted
> text file is produced.
> I decided on Fedora Core 7 and the 8.3 Beta release of Enterprise DB
> PostgreSQL. I decided to port 1 stored procedure plus it's several support
> stored procedures into pl/pgsql from T-SQL and compare the performance by

Noble, but if you're a postgresql beginner, you might want to take a
pass on running beta code.  You might be hitting a corner case,
performance wise, and never know it.

A few pointers.
1: Up your shared_buffers to 512M or so.
2: Up work_mem to 16M

Now, use the poor man's debugging tool for your stored procs, raise notice


create or replace function testfunc() returns int as $$
DECLARE
tm text;
cnt int;
BEGIN
select timeofday() into tm;
   RAISE NOTICE 'Time is now %',tm;
   select count(*) into cnt from accounts;
   select timeofday() into tm;
   RAISE NOTICE 'Time is now %',tm;
   RETURN 0;
END;
$$ language plpgsql;

Once you've found what's running slow, narrow it down to a specific part.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Bad query plans for queries on partitioned table

2007-12-05 Thread Julian Mehnle
Gregory, thanks for all the insight!  It is much appreciated.

Julian.


signature.asc
Description: This is a digitally signed message part.


Re: [PERFORM] autovacuum: recommended?

2007-12-05 Thread Decibel!

On Nov 19, 2007, at 9:23 AM, Tom Lane wrote:

Decibel! <[EMAIL PROTECTED]> writes:

FWIW, 20k rows isn't all that big, so I'm assuming that the
descriptions make the table very wide. Unless those descriptions are
what's being updated frequently, I suggest you put those in a
separate table (vertical partitioning). That will make the main table
much easier to vacuum, as well as reducing the impact of the high
churn rate.


Uh, you do realize that the TOAST mechanism does that pretty much
automatically?



Only if the row exceeds 2k, which for a lot of applications is huge.  
This is exactly why I wish toast limits were configurable on a per- 
table basis (I know there were changes here for 8.3, but IIRC it was  
only for toast chunk size).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

2007-12-05 Thread ismo . tuononen

I don't know if this is true in this case, but transaction level can be 
different, in mssql it is normally something like
TRANSACTION_READ_UNCOMMITTED
in postgres
TRANSACTION_READ_COMMITTED
and that makes huge difference in performance.

other thing can be the queries in procedures, if you use same queries, 
performance can be very bad. databases handles queries differently, and 
changing query can drop execution times to 1/100th easily.

Ismo

On Wed, 5 Dec 2007, Robert Bernabe wrote:

> Hi All,
> I've been tasked to evaluate PG as a possible replacement of our MS SQL 
> 2000 solution. Our solution is 100% stored procedure/function centric. It's a 
> report generation system whose sole task is to produce text files filled with 
> processed data that is post-processed by a secondary system. Basically 
> options are selected via a web interface and all these parameters are passed 
> unto the stored procedure and then the stored procedure would run and in the 
> process call other stored procedures until eventually a single formatted text 
> file is produced. 
> I decided on Fedora Core 7 and the 8.3 Beta release of Enterprise DB 
> PostgreSQL. I decided to port 1 stored procedure plus it's several support 
> stored procedures into pl/pgsql from T-SQL and compare the performance by 
> measuring how long each system takes to produce the text file. For this test, 
>  the output to the text file was discarded and the stored procedure/function 
> would end once the final temporary table is filled with the information that 
> is eventually dumped into the text file. 
> 
> Windows 2000 Professional + MSDE (/MS SQL) Boxvs.   FC7 + EnterpriseDB PG 
> Box
> 
> Note that both boxes have EXACTLY the same hardware (not VMWARE or anything) 
> AMD X2 3800
> 2 G RAM DDR 400
> 80 G Barracuda Sata
> 
> The data was copied to the Linux box and checked lightly for consistency 
> versus the windows box (number of tables / columns and records) and they all 
> match. After data transfer to the Linux Box, I ran REINDEX and ANALYZE. 
> 
> For the actual run the following tables were used and I'm displaying the 
> results of analyze.
> 
> INFO:  analyzing "public.AreaDefn"
> INFO:  "AreaDefn": scanned 15 of 15 pages, containing 2293 live rows and 0 
> dead rows; 2293 rows in sample, 2293 estimated total rows
> INFO:  analyzing "public.AreaDefn2"
> INFO:  "AreaDefn2": scanned 30 of 30 pages, containing 3439 live rows and 0 
> dead rows; 3000 rows in sample, 3439 estimated total rows
> INFO:  analyzing "public.Areas"
> INFO:  "Areas": scanned 2 of 2 pages, containing 164 live rows and 0 dead 
> rows; 164 rows in sample, 164 estimated total rows
> INFO:  analyzing "public.Brands"
> INFO:  "Brands": scanned 1 of 1 pages, containing 11 live rows and 0 dead 
> rows; 11 rows in sample, 11 estimated total rows
> INFO:  analyzing "public.Categories"
> INFO:  "Categories": scanned 1 of 1 pages, containing 26 live rows and 0 dead 
> rows; 26 rows in sample, 26 estimated total rows
> INFO:  analyzing "public.CategoryDefn"
> INFO:  "CategoryDefn": scanned 1 of 1 pages, containing 133 live rows and 0 
> dead rows; 133 rows in sample, 133 estimated total rows
> INFO:  analyzing "public.CategoryDefn2"
> INFO:  "CategoryDefn2": scanned 2 of 2 pages, containing 211 live rows and 0 
> dead rows; 211 rows in sample, 211 estimated total rows
> INFO:  analyzing "public.CategorySets"
> INFO:  "CategorySets": scanned 1 of 1 pages, containing 3 live rows and 0 
> dead rows; 3 rows in sample, 3 estimated total rows
> INFO:  analyzing "public.CATemplateGroup"
> INFO:  analyzing "public.Channels"
> INFO:  "Channels": scanned 1 of 1 pages, containing 7 live rows and 0 dead 
> rows; 7 rows in sample, 7 estimated total rows
> INFO:  analyzing "public.ClientCodes"
> INFO:  analyzing "public.Clients"
> INFO:  "Clients": scanned 7 of 7 pages, containing 366 live rows and 0 dead 
> rows; 366 rows in sample, 366 estimated total rows
> INFO:  analyzing "public.Customers"
> INFO:  "Customers": scanned 2 of 2 pages, containing 129 live rows and 0 dead 
> rows; 129 rows in sample, 129 estimated total rows
> NFO:  analyzing "public.Databases"
> INFO:  "Databases": scanned 1 of 1 pages, containing 1 live rows and 0 dead 
> rows; 1 rows in sample, 1 estimated total rows
> INFO:  analyzing "public.DataSources"
> INFO:  "DataSources": scanned 1 of 1 pages, containing 8 live rows and 0 dead 
> rows; 8 rows in sample, 8 estimated total rows
> INFO:  analyzing "public.DateToWeekConversion"
> INFO:  "DateToWeekConversion": scanned 4 of 4 pages, containing 371 live rows 
> and 0 dead rows; 371 rows in sample, 371 estimated total rows
> INFO:  analyzing "public.Divisions"
> INFO:  "Divisions": scanned 1 of 1 pages, containing 1 live rows and 0 dead 
> rows; 1 rows in sample, 1 estimated total rows
> INFO:  analyzing "public.MetricTable"
> INFO:  "MetricTable": scanned 1 of 1 pages, containing 48 live rows and 0 
> dead rows; 48 rows in sample, 48 estimated total rows
>

Re: [PERFORM] Bad query plans for queries on partitioned table

2007-12-05 Thread Gregory Stark
"Julian Mehnle" <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
>> "Julian Mehnle" <[EMAIL PROTECTED]> writes:
>> > I actually do have constraints on all the partitions, e.g. for week
>> > 34: [...]
>> >
>> > Shouldn't this be enough to give the query planner a clue that it
>> > only has to join the "email" and "email_extras" tables' partitions
>> > pair-wise, as opposed to cross-joining them?
>>
>> Ah, well, this falls under "The optimizer is a bit dumb about
>> partitioned tables". It only looks at the constraints to compare
>> against your WHERE clause. It doesn't compare them against the
>> constraints for other tables to see if they're partitioned on the same
>> key and therefore can be joined table-by-table.
>>
>> I want 8.4 to be cleverer in this area but there's a ton of things it
>> has to learn.
>
> That would be great.
>
> So there's nothing that can be done about it right now, apart from 
> manually combining separate SELECTs for each partition using UNION?

Well the in the query you gave I think if the partitions weren't completely
empty it would still be using the hash join, it would just be doin an append
of all the nearly-empty partitions first. The reason it's getting confused is
that in the absence of stats on them it thinks they contain hundreds of tuples
which will match your where clause and join clause. Look at the expected
number of rows the for the merge jjoin compared to the expected number of rows
for the hash join.

But yeah, there will be cases where you really want:

Append
   Merge Join
  Part1 of table1
  Part2 of table2
  Merge Join
 Part2 of table1
 Part2 of table2
  ...


But the planner only knows how to do:

Merge Join
   Append
  Part1 of table1
  Part2 of table1
  ...
  Append
 Part1 of table1
 Part2 of table2
 ...

Which requires two big sorts whereas the first plan could use indexes on
individual partitions. It also has a slower startup time and can't take
advantage of discovering that a partition of table1 is empty to avoid ever
reading from the corresponding partition of table2 the way the first plan can.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] TB-sized databases

2007-12-05 Thread Decibel!

On Nov 28, 2007, at 7:27 AM, Bill Moran wrote:

Is there something wrong with:
set enable_seqscan = off



Note that in cases of very heavy skew, that won't work. It only adds  
10M to the cost estimate for a seqscan, and it's definitely possible  
to have an index scan that looks even more expensive.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

2007-12-05 Thread Mark Cave-Ayland
On Wed, 2007-12-05 at 00:13 -0800, Robert Bernabe wrote:
> Hi All,
> I've been tasked to evaluate PG as a possible replacement of our
> MS SQL 2000 solution. Our solution is 100% stored procedure/function
> centric. It's a report generation system whose sole task is to produce
> text files filled with processed data that is post-processed by a
> secondary system. Basically options are selected via a web interface
> and all these parameters are passed unto the stored procedure and then
> the stored procedure would run and in the process call other stored
> procedures until eventually a single formatted text file is produced. 
> I decided on Fedora Core 7 and the 8.3 Beta release of Enterprise
> DB PostgreSQL. I decided to port 1 stored procedure plus it's several
> support stored procedures into pl/pgsql from T-SQL and compare the
> performance by measuring how long each system takes to produce the
> text file. For this test,  the output to the text file was discarded
> and the stored procedure/function would end once the final temporary
> table is filled with the information that is eventually dumped into
> the text file. 
> 
> Windows 2000 Professional + MSDE (/MS SQL) Boxvs.   FC7 +
> EnterpriseDB PG Box
> 
> Note that both boxes have EXACTLY the same hardware (not VMWARE or
> anything) 
> AMD X2 3800
> 2 G RAM DDR 400
> 80 G Barracuda Sata
> 
> The data was copied to the Linux box and checked lightly for
> consistency versus the windows box (number of tables / columns and
> records) and they all match. After data transfer to the Linux Box, I
> ran REINDEX and ANALYZE. 
> 
> For the actual run the following tables were used and I'm displaying
> the results of analyze.
> 
> INFO:  analyzing "public.AreaDefn"
> INFO:  "AreaDefn": scanned 15 of 15 pages, containing 2293 live rows
> and 0 dead rows; 2293 rows in sample, 2293 estimated total rows
> INFO:  analyzing "public.AreaDefn2"
> INFO:  "AreaDefn2": scanned 30 of 30 pages, containing 3439 live rows
> and 0 dead rows; 3000 rows in sample, 3439 estimated total rows
> INFO:  analyzing "public.Areas"
> INFO:  "Areas": scanned 2 of 2 pages, containing 164 live rows and 0
> dead rows; 164 rows in sample, 164 estimated total rows
> INFO:  analyzing "public.Brands"
> INFO:  "Brands": scanned 1 of 1 pages, containing 11 live rows and 0
> dead rows; 11 rows in sample, 11 estimated total rows
> INFO:  analyzing "public.Categories"
> INFO:  "Categories": scanned 1 of 1 pages, containing 26 live rows and
> 0 dead rows; 26 rows in sample, 26 estimated total rows
> INFO:  analyzing "public.CategoryDefn"
> INFO:  "CategoryDefn": scanned 1 of 1 pages, containing 133 live rows
> and 0 dead rows; 133 rows in sample, 133 estimated total rows
> INFO:  analyzing "public.CategoryDefn2"
> INFO:  "CategoryDefn2": scanned 2 of 2 pages, containing 211 live rows
> and 0 dead rows; 211 rows in sample, 211 estimated total rows
> INFO:  analyzing "public.CategorySets"
> INFO:  "CategorySets": scanned 1 of 1 pages, containing 3 live rows
> and 0 dead rows; 3 rows in sample, 3 estimated total rows
> INFO:  analyzing "public.CATemplateGroup"
> INFO:  analyzing "public.Channels"
> INFO:  "Channels": scanned 1 of 1 pages, containing 7 live rows and 0
> dead rows; 7 rows in sample, 7 estimated total rows
> INFO:  analyzing "public.ClientCodes"
> INFO:  analyzing "public.Clients"
> INFO:  "Clients": scanned 7 of 7 pages, containing 366 live rows and 0
> dead rows; 366 rows in sample, 366 estimated total rows
> INFO:  analyzing "public.Customers"
> INFO:  "Customers": scanned 2 of 2 pages, containing 129 live rows and
> 0 dead rows; 129 rows in sample, 129 estimated total rows
> NFO:  analyzing "public.Databases"
> INFO:  "Databases": scanned 1 of 1 pages, containing 1 live rows and 0
> dead rows; 1 rows in sample, 1 estimated total rows
> INFO:  analyzing "public.DataSources"
> INFO:  "DataSources": scanned 1 of 1 pages, containing 8 live rows and
> 0 dead rows; 8 rows in sample, 8 estimated total rows
> INFO:  analyzing "public.DateToWeekConversion"
> INFO:  "DateToWeekConversion": scanned 4 of 4 pages, containing 371
> live rows and 0 dead rows; 371 rows in sample, 371 estimated total
> rows
> INFO:  analyzing "public.Divisions"
> INFO:  "Divisions": scanned 1 of 1 pages, containing 1 live rows and 0
> dead rows; 1 rows in sample, 1 estimated total rows
> INFO:  analyzing "public.MetricTable"
> INFO:  "MetricTable": scanned 1 of 1 pages, containing 48 live rows
> and 0 dead rows; 48 rows in sample, 48 estimated total rows
> INFO:  analyzing "public.Offtake"
> INFO:  "Offtake": scanned 3000 of 13824 pages, containing 141000 live
> rows and 0 dead rows; 3000 rows in sample, 649728 estimated total rows
> INFO:  analyzing "public.SKUs"
> INFO:  "SKUs": scanned 3 of 3 pages, containing 73 live rows and 0
> dead rows; 73 rows in sample, 73 estimated total rows
> INFO:  analyzing "public.SMSDefaults"
> INFO:  "SMSDefaults": scanned 1 of 1 pages, containing 43 live rows
> and 0 dead r