Re: [PERFORM] Benchmark

2005-02-13 Thread Ron Mayer
Tom Lane wrote:
Great Bridge did essentially that years ago, but I think we only got
away with it because we didn't say which DBs "Commercial Database A"
and "Commercial Database B" actually were.  Even off the record, we
were only allowed to tell people that the commercial DBs were Oracle
and SQL Server ... but not which was which.
IMHO clues like:
 "What versions of the databases did you use?
 - PostgreSQL - 7.0 release version
 - Proprietary 1 - 8.1.5
 - Proprietary 2 - 7.0
 - MySQL - 3.22.32
 - Interbase - 6.0
 "
and
 "PostgreSQL" and "Proprietary 1" was running "red hat linux 6.1" and
 "Proprietary 2" was running Windows NT server 4 - service pack 4"
in articles like this one: http://www.xperts.com/news/press1.htm
helped some people narrow it down a bit.  :)
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] Benchmark

2005-02-13 Thread Greg Stark
Christopher Browne <[EMAIL PROTECTED]> writes:

> After all, the cost of a computer system to run the transactions is
> likely to be comprised of some combination of software licenses and
> hardware costs.  Even if the software is free, the hardware isn't.

And labour costs.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Benchmark

2005-02-13 Thread Patrick Meylemans


Dear,
We are using PostgreSQL for 4 Years now, one can say it is a blessing to
maintain. Our previous database was number one (;-), it was much harder
to maintain so labor is a pro for PostgreSQL ...
Kind Regards
Patrick Meylemans
IT Manager 
WTCM-CRIF
Celestijnenlaan 300C
3001 Helerlee

At 11:34 13/02/2005 -0500, Greg Stark 
wrote:
Christopher Browne
<[EMAIL PROTECTED]> writes:
> After all, the cost of a computer system to run the transactions
is
> likely to be comprised of some combination of software licenses
and
> hardware costs.  Even if the software is free, the hardware
isn't.
And labour costs.
-- 
greg

---(end of
broadcast)---
TIP 6: Have you searched our list archives?
  
http://archives.postgresql.org



[PERFORM] estimated rows vs. actual rows

2005-02-13 Thread Jaime Casanova
Hi,

in the #postgresql-es channel someone shows me this:

pgsql-7.4.5 + postgis 

--- begin context ---

CREATE TABLE calles (
  gid int4 NOT NULL DEFAULT nextval('public.callesstgo_gid_seq'::text),
  nombre varchar,
  inicio int4,
  termino int4,
  comuna varchar,
  ciudad varchar,
  region numeric,
  pais varchar,
  the_geom geometry,
  id_comuna numeric,
  CONSTRAINT callesstgo_pkey PRIMARY KEY (gid),
  CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) =
'MULTILINESTRING'::text OR the_geom IS NULL),
  CONSTRAINT enforce_srid_the_geom CHECK (srid(the_geom) = -1)
) 
WITH OIDS;
 
CREATE INDEX idx_region_comunas ON calles USING btree
  (id_comuna, region);

select count(*) from calles;
143902

--- end context ---
 
Ok . here is the problem (BTW, the database has been analyzed just
before this query was execured)

explain analyze
select * from calles where id_comuna = 92 and region=13; 

QUERY PLAN Seq Scan on calles  (cost=0.00..7876.53 rows=2610
width=279) (actual time=182.590..454.195 rows=4612 loops=1)
  Filter: ((id_comuna = 92::numeric) AND (region = 13::numeric))
Total runtime: 456.876 ms


Why is this query using a seq scan rather than a index scan? i notice
the diff between the estimated rows and actual rows (almost 2000).

Can this affect the query plan? i think this is a problem of
statistics, am i right? if so, what can be done?

regards,
Jaime Casanova

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] estimated rows vs. actual rows

2005-02-13 Thread Josh Berkus
Jaime,

> Why is this query using a seq scan rather than a index scan? 

Because it thinks a seq scan will be faster.

> i notice 
> the diff between the estimated rows and actual rows (almost 2000).

Yes, ANALYZE, and possibly increasing the column stats, should help that.

> Can this affect the query plan? i think this is a problem of
> statistics, am i right? if so, what can be done?

Well, if the estimate was accurate, PG would be even *more* likely to use a 
seq scan (more rows).

I think maybe you should establish whether a seq scan actually *is* faster?   
Perhaps do SET enable_seqscan = false and then re-run the query a few times?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] estimated rows vs. actual rows

2005-02-13 Thread Jaime Casanova
On Sun, 13 Feb 2005 13:41:09 -0800, Josh Berkus  wrote:
> Jaime,
> 
> > Why is this query using a seq scan rather than a index scan?
> 
> Because it thinks a seq scan will be faster.
> 
I will suggest him to probe with seq scans disabled.

But, IMHO, if the table has 143902 and it thinks will retrieve 2610
(almost 1.81% of the total). it won't be faster with an index?

i know, i will suggest him to probe to be sure. just an opinion.

regards,
Jaime Casanova

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] estimated rows vs. actual rows

2005-02-13 Thread Tom Lane
Jaime Casanova <[EMAIL PROTECTED]> writes:
> But, IMHO, if the table has 143902 and it thinks will retrieve 2610
> (almost 1.81% of the total). it won't be faster with an index?

That's almost one row in fifty.  We don't know how wide the table is,
but it's certainly possible that there are order-of-a-hundred rows
on each page; in which case the indexscan is likely to hit every page.
Twice.  Not in sequence.  Only if the selected rows are pretty well
clustered in a small part of the table is this going to be a win
over a seqscan.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] estimated rows vs. actual rows

2005-02-13 Thread Mark Kirkwood
Jaime Casanova wrote:
But, IMHO, if the table has 143902 and it thinks will retrieve 2610
(almost 1.81% of the total). it won't be faster with an index?
Depends on how those 2610 rows are distributed amongst the 143902. The 
worst case scenario is each one of them in its own page. In that case 
you have to read 2610 *pages*, which is probably a significant 
percentage of the table.

You can find out this information from the pg_stats view (particularly 
the correlation column).

Mark
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings