[PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??

2003-11-13 Thread Rajesh Kumar Mallah

Hi,

NOT EXISTS is taking almost double time than NOT IN .
I know IN has been optimised in 7.4 but is anything 
wrong with the NOT EXISTS?

I have vaccumed , analyze and run the query many times
still not in is faster than exists :>


Regds
Mallah.

NOT IN PLAN

tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where
 profile_id not in (select  profile_id from general.account_profiles ) ;
 QUERY PLAN
-
Aggregate  (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 
rows=1 loops=1)
  ->  Seq Scan on profile_master  (cost=4458.25..31340.38 rows=359125 width=0) (actual 
time=1055.496..4637.908 rows=470386 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
  ->  Seq Scan on account_profiles  (cost=0.00..3817.80 rows=256180 width=4) 
(actual time=0.061..507.811 rows=256180 loops=1)
Total runtime: 5337.591 ms
(6 rows)


tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where 
not exists 
(select  profile_id from general.account_profiles where 
profile_id=general.profile_master.profile_id ) ;

  QUERY PLAN
---
Aggregate  (cost=1674981.97..1674981.97 rows=1 width=0) (actual 
time=14600.386..14600.387 rows=1 loops=1)
  ->  Seq Scan on profile_master  (cost=0.00..1674084.16 rows=359125 width=0) (actual 
time=13.687..13815.798 rows=470386 loops=1)
Filter: (NOT (subplan))
SubPlan
  ->  Index Scan using account_profiles_profile_id on account_profiles  
(cost=0.00..4.59 rows=2 width=4) (actual time=0.013..0.013 rows=0 loops=718250)
Index Cond: (profile_id = $0)
Total runtime: 14600.531 ms



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] performance optimzations

2003-11-13 Thread Rod Taylor
> > Most clients will be interested in say the last 7 days worth of data? 
> > Great.. Start out with 4GB ram on a good Dual CPU -- Opterons seem to
> > work quite well -- and make sure the motherboard can hold double that in
> > memory for an upgrade sometime next year when you've become popular.
> 
> Unfortunately, the hardware available is pretty much fixed in regards to
> the system.  I can play around with the raid configurations and have
> some limited choice in regards to the raid controller and number of
> drivers but that's about all in terms of hardware.

Good luck then. Unless the configuration takes into account incremental
additions in ram and disk, sustained growth could get very expensive. I
guess that depends on the business plan expectations.

This just puts more emphasis to offload everything you can onto machines
that can multiply.

> The current file system holding the user and email information indicates
> the current data has about 64GB (70K accounts, I'm not sure how many are
> active but 50% might be good guess).  This seems to be somewhat of a
> steady state however.

35k clients checking their mail daily isn't so bad. Around 10 pages per
second peak load?


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Value of Quad vs. Dual Processor machine

2003-11-13 Thread Robert Treat
On Wed, 2003-11-12 at 09:28, Jeff wrote:
> On Tue, 11 Nov 2003 21:13:19 -0500
> "Chris Field" <[EMAIL PROTECTED]> wrote:
> 
> > we are looking at Xeon, We are currently running it on a quad sun v880
> > compiled to be 64bit and have been getting dreadful performance.  I
> > don't think we really have much to gain from going 64bit.
> > 
> > 
> By chance, are you running 7.3.4 on that sun?
> If so, try this:
> export CFLAGS=-02
> ./configure
> 
> and rebuild PG.
> 
> Before 7.4 PG was build with _no_ optimization on Solaris. 
> Recompiling gives __HUGE__ (notice the underscores) performance gains.
> 
> And onto the dual vs quad.
> 
> PG will only use 1 cpu / connection / query. 
> 
> So if your machine iwll have 1-2 queries running at a time those other 2
> proc's will sit around idling.  However if you are going to have a bunch
> going, 4 cpus will be most useful.  One of hte nicest things to do for
> PG is more ram and fast IO. It really loves those things.
> 

We've just started kicking around the idea of moving one of our boxes to
a quad-proc machine from a dual. Under normal circumstances the 2
processors handle maybe 200 transactions per second with 90% system
idle.  However we have people who occasionally run historical reports on
our data, and those reports are fairly CPU intensive. Usually it is not
a problem for the main web system, but when pg_dump is running, that is
also cpu intensive, so we end up with two highly cpu intensive items
running on our machine, and we start to notice issues on the main web
system. 
  

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??

2003-11-13 Thread Tom Lane
Rajesh Kumar Mallah <[EMAIL PROTECTED]> writes:
> NOT EXISTS is taking almost double time than NOT IN .
> I know IN has been optimised in 7.4 but is anything 
> wrong with the NOT EXISTS?

That's the expected behavior in 7.4.  EXISTS in the style you are using
it effectively forces a nestloop-with-inner-indexscan implementation.
As of 7.4, IN can do that, but it can do several other things too,
including the hash-type plan you have here.  So assuming that the
planner chooses the right plan choice (not always a given ;-))
IN should be as fast or faster than EXISTS in all cases.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??

2003-11-13 Thread Bruce Momjian
Rajesh Kumar Mallah wrote:
> 
> Hi,
> 
> NOT EXISTS is taking almost double time than NOT IN .
> I know IN has been optimised in 7.4 but is anything 
> wrong with the NOT EXISTS?
> 
> I have vaccumed , analyze and run the query many times
> still not in is faster than exists :>

Seems fine.  In 7.4, NOT IN will often be faster that NOT EXISTS.  NOT
EXISTS didn't change --- there are restrictions on how far we can
optimize NOT EXISTS.  NOT IN has just become much faster in 7.4.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??

2003-11-13 Thread Rajesh Kumar Mallah




Robert Treat wrote:

  It is believed that the IN optimization can lead to faster IN times than
EXIST times on some queries, the extent of which is still a bit of an
unknown. (Incidentally is there an FAQ item on this that needs
updating?)
  


Thanks every one for clarifying. Its really a nice thing to see IN
working
so well becoz its easier to read the SQL using IN. 

looks like NOT IN is indifferent to indexes where is IN uses indexes ,
is it true?

does indexes affect the new manner in which IN works in 7.4 ?






  
Does the not exist query produce worse results in 7.4 than it did in
7.3?

Will surely post the overvation sometime.



Regards
Mallah.



  

Robert Treat

On Thu, 2003-11-13 at 02:53, Rajesh Kumar Mallah wrote:
  
  
Hi,

NOT EXISTS is taking almost double time than NOT IN .
I know IN has been optimised in 7.4 but is anything 
wrong with the NOT EXISTS?

I have vaccumed , analyze and run the query many times
still not in is faster than exists :>


Regds
Mallah.

NOT IN PLAN

tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where
 profile_id not in (select  profile_id from general.account_profiles ) ;
 QUERY PLAN
-
Aggregate  (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 rows=1 loops=1)
  ->  Seq Scan on profile_master  (cost=4458.25..31340.38 rows=359125 width=0) (actual time=1055.496..4637.908 rows=470386 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
  ->  Seq Scan on account_profiles  (cost=0.00..3817.80 rows=256180 width=4) (actual time=0.061..507.811 rows=256180 loops=1)
Total runtime: 5337.591 ms
(6 rows)


tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where not exists 
(select  profile_id from general.account_profiles where profile_id=general.profile_master.profile_id ) ;

  QUERY PLAN
---
Aggregate  (cost=1674981.97..1674981.97 rows=1 width=0) (actual time=14600.386..14600.387 rows=1 loops=1)
  ->  Seq Scan on profile_master  (cost=0.00..1674084.16 rows=359125 width=0) (actual time=13.687..13815.798 rows=470386 loops=1)
Filter: (NOT (subplan))
SubPlan
  ->  Index Scan using account_profiles_profile_id on account_profiles  (cost=0.00..4.59 rows=2 width=4) (actual time=0.013..0.013 rows=0 loops=718250)
Index Cond: (profile_id = $0)
Total runtime: 14600.531 ms



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

  
  
  







Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??

2003-11-13 Thread Robert Treat
It is believed that the IN optimization can lead to faster IN times than
EXIST times on some queries, the extent of which is still a bit of an
unknown. (Incidentally is there an FAQ item on this that needs
updating?)

Does the not exist query produce worse results in 7.4 than it did in
7.3?

Robert Treat

On Thu, 2003-11-13 at 02:53, Rajesh Kumar Mallah wrote:
> 
> Hi,
> 
> NOT EXISTS is taking almost double time than NOT IN .
> I know IN has been optimised in 7.4 but is anything 
> wrong with the NOT EXISTS?
> 
> I have vaccumed , analyze and run the query many times
> still not in is faster than exists :>
> 
> 
> Regds
> Mallah.
> 
> NOT IN PLAN
> 
> tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where
>  profile_id not in (select  profile_id from general.account_profiles ) ;
>  QUERY PLAN
> -
> Aggregate  (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 
> rows=1 loops=1)
>   ->  Seq Scan on profile_master  (cost=4458.25..31340.38 rows=359125 width=0) 
> (actual time=1055.496..4637.908 rows=470386 loops=1)
> Filter: (NOT (hashed subplan))
> SubPlan
>   ->  Seq Scan on account_profiles  (cost=0.00..3817.80 rows=256180 width=4) 
> (actual time=0.061..507.811 rows=256180 loops=1)
> Total runtime: 5337.591 ms
> (6 rows)
> 
> 
> tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where 
> not exists 
> (select  profile_id from general.account_profiles where 
> profile_id=general.profile_master.profile_id ) ;
> 
>   QUERY PLAN
> ---
> Aggregate  (cost=1674981.97..1674981.97 rows=1 width=0) (actual 
> time=14600.386..14600.387 rows=1 loops=1)
>   ->  Seq Scan on profile_master  (cost=0.00..1674084.16 rows=359125 width=0) 
> (actual time=13.687..13815.798 rows=470386 loops=1)
> Filter: (NOT (subplan))
> SubPlan
>   ->  Index Scan using account_profiles_profile_id on account_profiles  
> (cost=0.00..4.59 rows=2 width=4) (actual time=0.013..0.013 rows=0 loops=718250)
> Index Cond: (profile_id = $0)
> Total runtime: 14600.531 ms
> 
> 
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??

2003-11-13 Thread Tom Lane
Robert Treat <[EMAIL PROTECTED]> writes:
> Does the not exist query produce worse results in 7.4 than it did in
> 7.3?

EXISTS should work the same as before.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??

2003-11-13 Thread Robert Treat
On Thu, 2003-11-13 at 12:00, Tom Lane wrote:
> Robert Treat <[EMAIL PROTECTED]> writes:
> > Does the not exist query produce worse results in 7.4 than it did in
> > 7.3?
> 
> EXISTS should work the same as before.
> 

right. the original poster is asking if there is "something wrong with
exist" based on the comparison to IN, he needs to compare it vs. 7.3
EXISTS to determine if something is wrong. 

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] strange estimate for number of rows

2003-11-13 Thread Andrew Sullivan
Hi all,

I've one here that I cannot fathom.  Any suggestions?  

We have a table, call it tablename, where we're selecting by a range
of dates and an identifier.  (This is redacted, obviously):

\d tablename

   Column   |   Type   | Modifiers  
+--+
 id | integer  | not null
 transaction_date   | timestamp with time zone | not null
 product_id | integer  | not null
Indexes:
"trans_posted_trans_date_idx" btree (transaction_date, product_id)


The statistics on transaction_date and product_id are set to 1000. 
Everything is all analysed nicely.  But I'm getting a poor plan,
because of an estimate that the number of rows to be returned is
about double how many actually are:

explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;

QUERY PLAN 
  
---
 Seq Scan on transactions_posted  (cost=0.00..376630.33 rows=700923
width=91) (actual time=8422.253..36176.078 rows=316029 loops=1)
   Filter: ((transaction_date >= '2003-09-01 00:00:00-04'::timestamp
with time zone) AND (transaction_date < '2003-10-01
00:00:00-04'::timestamp with time zone) AND (product_id = 2))
 Total runtime: 36357.630 ms
(3 rows)

SET enable_seqscan = off;

explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;

QUERY PLAN 

---
 Index Scan using trans_posted_trans_date_idx on transactions_posted
(cost=0.00..1088862.56 rows=700923 width=91) (actual
time=35.214..14816.257 rows=316029 loops=1)
   Index Cond: ((transaction_date >= '2003-09-01
00:00:00-04'::timestamp with time zone) AND (transaction_date <
'2003-10-01 00:00:00-04'::timestamp with time zone) AND (product_id =
2))
 Total runtime: 15009.816 ms
(3 rows)

SELECT attname,null_frac,avg_width,n_distinct,correlation FROM
pg_stats where tablename = 'transactions_posted' AND attname in
('transaction_date','product_id');
 attname  | null_frac | avg_width | n_distinct | correlation 
--+---+---++-
 product_id   | 0 | 4 |  2 |0.200956
 transaction_date | 0 | 8 |  -0.200791 |0.289248

Any ideas?  I'm loathe to recommend cluster, since the data will not
stay clustered.

A


-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Union+group by planner estimates way off?

2003-11-13 Thread Arthur Ward
On 7.4 RC2, I'm seeing a case where the query planner estimates are way
out of line after grouping the result of a union. I've tried adjusting the
statistics targets up to 200, and it made no difference in the planner's
estimates. The point of the full query this came from is that it also has
an aggregate function that produces a space-delimited list of commodity &
fak for each id. Does anyone have any suggestions on tweaks to apply or
ways to rewrite this? Is this one of those ugly corners where the query
planner doesn't have a clue how to estimate this (seeing the nice round
200 estimate makes me suspicious)?

EXPLAIN ANALYZE SELECT id FROM
(SELECT id, commodity FROM commodities WHERE commodity IS NOT NULL
   UNION
 SELECT id, fak FROM commodities WHERE fak IS NOT NULL
) all_commodities GROUP BY id;
   QUERY
PLAN

 HashAggregate  (cost=15939.16..15939.16 rows=200 width=4) (actual
time=3537.281..3680.418 rows=83306 loops=1)
   ->  Subquery Scan all_commodities  (cost=14002.00..15697.02 rows=96858
width=4) (actual time=2268.052..3214.996 rows=95715 loops=1)
 ->  Unique  (cost=14002.00..14728.44 rows=96858 width=15) (actual
time=2268.043..2881.688 rows=95715 loops=1)
   ->  Sort  (cost=14002.00..14244.15 rows=96858 width=15)
(actual time=2268.037..2527.083 rows=18 loops=1)
 Sort Key: id, commodity
 ->  Append  (cost=0.00..5034.42 rows=96858 width=15)
(actual time=7.402..1220.320 rows=18 loops=1)
   ->  Subquery Scan "*SELECT* 1" 
(cost=0.00..2401.23 rows=36831 width=15)
(actual time=7.398..590.004 rows=39772 loops=1)
 ->  Seq Scan on commodities 
(cost=0.00..2032.92 rows=36831 width=15)
(actual time=7.388..468.415 rows=39772
loops=1)
   Filter: (commodity IS NOT NULL)
   ->  Subquery Scan "*SELECT* 2" 
(cost=0.00..2633.19 rows=60027 width=14)
(actual time=0.016..408.160 rows=60236 loops=1)
 ->  Seq Scan on commodities 
(cost=0.00..2032.92 rows=60027 width=14)
(actual time=0.010..221.635 rows=60236
loops=1)
   Filter: (fak IS NOT NULL)
 Total runtime: 3783.009 ms
(13 rows)


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Union+group by planner estimates way off?

2003-11-13 Thread Tom Lane
"Arthur Ward" <[EMAIL PROTECTED]> writes:
> EXPLAIN ANALYZE SELECT id FROM
> (SELECT id, commodity FROM commodities WHERE commodity IS NOT NULL
>UNION
>  SELECT id, fak FROM commodities WHERE fak IS NOT NULL
> ) all_commodities GROUP BY id;
>QUERY
> PLAN
> 
>  HashAggregate  (cost=15939.16..15939.16 rows=200 width=4) (actual
> time=3537.281..3680.418 rows=83306 loops=1)
>->  Subquery Scan all_commodities  (cost=14002.00..15697.02 rows=96858
> width=4) (actual time=2268.052..3214.996 rows=95715 loops=1)

It's falling back to a default estimate because it doesn't know how to
find any statistics for the output of a sub-select.  I have a TODO
somewhere about burrowing down into sub-selects to see if the output maps
directly to a column that we'd have stats for ... but it's not done yet.

In this particular case the inaccurate estimate doesn't matter too much,
I think, although it might be encouraging the system to select hash
aggregation since it thinks the hashtable will be pretty small.  If the
estimate were getting used to plan higher-up plan steps then it could
be a bigger problem.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] strange estimate for number of rows

2003-11-13 Thread Tom Lane
Andrew Sullivan <[EMAIL PROTECTED]> writes:
> The statistics on transaction_date and product_id are set to 1000. 
> Everything is all analysed nicely.  But I'm getting a poor plan,
> because of an estimate that the number of rows to be returned is
> about double how many actually are:

> explain analyse select * from transactions_posted where
> transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
> product_id = 2;

Are the estimates accurate for queries on the two columns individually,
ie
... where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1'
... where product_id = 2

If so, the problem is that there's a correlation between
transaction_date and product_id, which the system cannot model because
it has no multi-column statistics.

However, given that the estimate is only off by about a factor of 2,
you'd still be getting the wrong plan even if the estimate were perfect,
because the estimated costs differ by nearly a factor of 3.

Given the actual runtimes, I'm thinking maybe you want to reduce
random_page_cost.  What are you using for that now?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Union+group by planner estimates way off?

2003-11-13 Thread Arthur Ward
> In this particular case the inaccurate estimate doesn't matter too much,
> I think, although it might be encouraging the system to select hash
> aggregation since it thinks the hashtable will be pretty small.  If the
> estimate were getting used to plan higher-up plan steps then it could
> be a bigger problem.

That's my problem; this is a subselect feeding in to a larger query. That
wrong estimate causes the planner to select a nested-loop at the next step
up. At 83,000 rows, the word is "ouch!"

At any rate, I discovered this while dissecting a giant & slow query.
Hence, while disabling nested-loop joins avoids this particular pitfall,
it's not good for the bigger picture. I think I'm going to end up
splitting this larger query into smaller parts and reassemble the pieces
in the application so I can push some smarts past other subselect
boundaries. For my purposes, that should skirt the issue of union+group
estimates not being calculated.

As always, thanks for the fast answers!

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??

2003-11-13 Thread Rajesh Kumar Mallah




Tom Lane wrote:

  Rajesh Kumar Mallah <[EMAIL PROTECTED]> writes:
  
  
NOT EXISTS is taking almost double time than NOT IN .
I know IN has been optimised in 7.4 but is anything 
wrong with the NOT EXISTS?

  
  
That's the expected behavior in 7.4.  EXISTS in the style you are using
it effectively forces a nestloop-with-inner-indexscan implementation.
As of 7.4, IN can do that, but it can do several other things too,
including the hash-type plan you have here.  So assuming that the
planner chooses the right plan choice (not always a given ;-))



  IN should be as fast or faster than EXISTS in all cases.


Not in this case :) , did i miss something silly?

tradein_clients=# explain  SELECT count(*)
from user_accounts where email is not null and email not in
 (select email from profile_master where email is not null) ;
  QUERY PLAN
--
 Aggregate  (cost=9587726326.93..9587726326.93
rows=1 width=0)
   ->  Seq Scan on user_accounts  (cost=0.00..9587725473.40
rows=341412 width=0)
 Filter: ((email IS NOT NULL) AND (NOT (subplan)))
 SubPlan
   ->  Seq Scan on profile_master  (cost=0.00..25132.24
rows=674633 width=25)
 Filter: (email IS NOT NULL)
(6 rows)

The query above does not return

tradein_clients=# explain analyze SELECT count(*)
from user_accounts where email is not null and 
not exists (select email from profile_master where
email=user_accounts.email) ;
   
QUERY PLAN
--
 Aggregate  (cost=2850847.55..2850847.55 rows=1 width=0) (actual
time=34075.100..34075.101 rows=1 loops=1)
   ->  Seq Scan on user_accounts  (cost=0.00..2849994.02 rows=341412
width=0) (actual time=8.066..34066.329 rows=3882 loops=1)
 Filter: ((email IS NOT NULL) AND (NOT (subplan)))
 SubPlan
   ->  Index Scan using profile_master_email on
profile_master  (cost=0.00..35.60 rows=9 width=25) (actual
time=0.044..0.044 rows=1 loops=686716)
 Index Cond: ((email)::text = ($0)::text)
 Total runtime: 34075.213 ms
(7 rows)

tradein_clients=#



  

			regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
  



-- 

Rajesh Kumar Mallah,
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.





[PERFORM] Storage space, RAM for NUMERIC

2003-11-13 Thread Josh Berkus
Folks,

How would I calculate storage space/required ram on a 50-digit NUMERIC?

And the docs state that NUMERIC is slow.   Is this just slow for calculations 
(due to the conversion to float & back) or slow for index lookups as well?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] strange estimate for number of rows

2003-11-13 Thread Daniel Manley
Hi, I'm the lead developer on the project this concerns (forgive my 
newbiness on this list).

We tried a couple of scenarios with effective_cache_size=6, 
cpu_index_tuple_cost=0.0001 and random_page_cost=2 with no change in the 
plan.

explain analyse select * from tablename where transaction_date >= 
'2003-9-1' and transaction_date < '2003-10-1';

Seq Scan on tablename  (cost=0.00..348199.14 rows=1180724 width=91) 
(actual time=7727.668..36286.898 rows=579238 loops=1)
  Filter: ((transaction_date >= '2003-09-01 00:00:00+00'::timestamp 
with time zone) AND (transaction_date < '2003-10-01 
00:00:00+00'::timestamp with time zone))
Total runtime: 36625.351 ms

explain analyse select * from transactions_posted where product_id = 2;

Seq Scan on transactions_posted  (cost=0.00..319767.95 rows=6785237 
width=91) (actual time=0.091..35596.328 rows=5713877 loops=1)
  Filter: (product_id = 2)
Total runtime: 38685.373 ms

The product_id alone gives a difference of a millions rows from estimate 
to actual, vs. the factor of 2 from the transaction_date.

Dan Manley

Tom Lane пишет:

Andrew Sullivan <[EMAIL PROTECTED]> writes:
 

The statistics on transaction_date and product_id are set to 1000. 
Everything is all analysed nicely.  But I'm getting a poor plan,
because of an estimate that the number of rows to be returned is
about double how many actually are:
   

 

explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;
   

Are the estimates accurate for queries on the two columns individually,
ie
... where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1'
... where product_id = 2
If so, the problem is that there's a correlation between
transaction_date and product_id, which the system cannot model because
it has no multi-column statistics.
However, given that the estimate is only off by about a factor of 2,
you'd still be getting the wrong plan even if the estimate were perfect,
because the estimated costs differ by nearly a factor of 3.
Given the actual runtimes, I'm thinking maybe you want to reduce
random_page_cost.  What are you using for that now?
			regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] strange estimate for number of rows

2003-11-13 Thread Tom Lane
Daniel Manley <[EMAIL PROTECTED]> writes:
> The product_id alone gives a difference of a millions rows from estimate 
> to actual, vs. the factor of 2 from the transaction_date.

You should be thinking in terms of ratios, not absolute difference.
The rows estimate for product_id doesn't look too bad to me; the one for
transaction_date is much further off (a factor of 2).  Which is odd,
because the system can usually do all right on range estimates if you've
let it run an ANALYZE with enough histogram bins.  Could we see the
pg_stats row for transaction_date?

> We tried a couple of scenarios with effective_cache_size=6, 
> cpu_index_tuple_cost=0.0001 and random_page_cost=2 with no change in the 
> plan.

Since you need about a factor of 3 change in the cost estimate to get it to
switch plans, changing random_page_cost by a factor of 2 ain't gonna do
it (the other two numbers are second-order adjustments unlikely to have
much effect, I think).  Try 1.5, or even less ... of course, you have to
keep an eye on your other queries and make sure they don't go nuts, but
from what I've heard about your hardware setup a low random_page_cost
isn't out of line with the physical realities.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] strange estimate for number of rows

2003-11-13 Thread Andrew Sullivan
On Thu, Nov 13, 2003 at 03:19:08PM -0500, Tom Lane wrote:
> because the system can usually do all right on range estimates if you've
> let it run an ANALYZE with enough histogram bins.  Could we see the
> pg_stats row for transaction_date?

Do you want the whole thing?  I left out the really verbose bits when
I posted this in the original:

SELECT attname,null_frac,avg_width,n_distinct,correlation FROM
pg_stats where tablename = 'transactions_posted' AND attname in
('transaction_date','product_id');
 attname  | null_frac | avg_width | n_distinct | correlation
--+---+---++-
 product_id   | 0 | 4 |  2 |0.200956
 transaction_date | 0 | 8 |  -0.200791 |0.289248

> 
> Since you need about a factor of 3 change in the cost estimate to get it to
> switch plans, changing random_page_cost by a factor of 2 ain't gonna do
> it (the other two numbers are second-order adjustments unlikely to have
> much effect, I think).  Try 1.5, or even less ... of course, you have to
> keep an eye on your other queries and make sure they don't go nuts, but
> from what I've heard about your hardware setup a low random_page_cost
> isn't out of line with the physical realities.

Actually, this one's on an internal box, and I think 1.5 is too low
-- it's really just a pair of mirrored SCSI disks on a PCI controller
in this case.  That does the trick, though, so maybe I'm just being
too conservantive.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] strange estimate for number of rows

2003-11-13 Thread Andrew Sullivan
On Thu, Nov 13, 2003 at 04:37:03PM -0500, Andrew Sullivan wrote:
> Actually, this one's on an internal box, and I think 1.5 is too low
> -- it's really just a pair of mirrored SCSI disks on a PCI controller
> in this case.  That does the trick, though, so maybe I'm just being
> too conservantive.

I spoke too soon.  I'd left enable_seqscan=off set.  It doesn't
actually prefer an indexscan until I set the random_page_cost to .5. 
I think that's a little unrealistic ;-)

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly