[GENERAL] usage of indexes for inner joins

2007-09-27 Thread Jan Theodore Galkowski
I fear this has been asked many times about PostgreSQL, and I have read
the docs about how indexes are supposed to be defined and used, but I
don't understand why the engine and optimizer is doing what it does in
the simplest of situations.  Is it that its tuning is heavily data
dependent?

My case of interest is more complicated, but I decided to create a toy
case to try to understand.  Here it is:


  -- Table "foo" DDL

  CREATE TABLE "public"."foo"(

  "projectid" int4 NOT NULL ,

  "uid" int4 NOT NULL ,

  "name" varchar(254) NOT NULL ,

  "ver" varchar(127) NOT NULL ,

  "startdate" date NOT NULL ,

  "enddate" date NOT NULL ,

  "status" varchar(254) NOT NULL ,

  "percentdone" numeric(7,2) NOT NULL ,

  "championuid" int4 NOT NULL ,

  "pmuid" int4 NOT NULL ,

  PRIMARY KEY ("projectid")

  )  WITHOUT OIDS;


  -- Table "bignum" DDL

  CREATE TABLE "public"."bignum"(

  "thing" numeric(100) NOT NULL

  )  WITHOUT OIDS;

  CREATE INDEX "t" ON "public"."bignum" USING btree ("thing");


Running

EXPLAIN ANALYZE SELECT A.* FROM bignum  B, foo  A WHERE A.projectid
= B.thing;

yields:

Nested Loop  (cost=0.00..15.51 rows=1 width=407) (actual
time=0.041..0.041 rows=0 loops=1)

  Join Filter: ((a.projectid)::numeric = b.thing)  ->

Seq Scan on bignum b (cost=0.00..1.01 rows=1 width=16) (actual
time=0.024..0.027 rows=1 loops=1)  ->

Seq Scan on foo a  (cost=0.00..11.80 rows=180 width=407) (actual
time=0.003..0.003 rows=0 loops=1)

Total runtime: .169 ms ;

Like *how* *come*?  There are indexes on both columns of the join.  Is
it the NUMERIC datatype messing things up?  Unlikely, as I've seen the
same with INTEGERs.

If it is data dependent (these tables are presently empty), any
suggestions as to how to tune a database for unknown mixes of data?

This is run on the Windows version of PG, but I'm seeing the same kind
of thing on Linux.

Thanks.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] usage of indexes for inner joins

2007-10-01 Thread Jan Theodore Galkowski
thanks for all your useful comments.  i will study all of them.

a couple of inline comments below, just for clarification to the group,
marked with asterisks.

On Mon, 1 Oct 2007 13:13:23 -0500, "Scott Marlowe"
<[EMAIL PROTECTED]> said:
> On 10/1/07, Jan Theodore Galkowski <[EMAIL PROTECTED]> wrote:
> > Scott,
> >
> > i didn't think this belonged in the general list, but the example i
> > gave for discussion was a toy, for illustration.  i could not very
> > well post the actual example for many reasons, including proprietary
> > ones and, given this is how things work, because the 1.5 million row
> > table in question is its own smallest description.
>
> This is the exact kind of question that belongs on -general.  But it
> does get asked a lot, especially by people coming from other
> databases.
>
> > while indexes are being used on that table, there's a companion
> > table which is much smaller -- a "mere" 75000 rows -- which is
> > suffering a sequential scan, and i was trying to eliminate those.
>
> Well, don't think of sequential scans as plain bad.  Sometimes they're
> the best choice, sometimes they're not.
>
> Now, if an index scan is provably and noticeably faster than the
> sequential scan, then the planner is making the wrong decision.  Have
> you tried running your query with
>
> set enable_seqscan=off;

***actually, yes.  the engine just ignored it.***
>
> to see how it behaves?  I've found many somewhat slow queries got
> really fast or really slow when I did that.
>
> Note that you shouldn't blindly run a query all the time with that
> setting, as there are many instances where seqscan is the right
> answer.  Also, your explain cost estimates will all be way off.
>
> > perhaps it is true that ANALYZE isn't being done often enough.
> > perhaps VACUUMs aren't being done often enough either.  we're leary
> > of scheduling repeated VACUUMs having encountered a case where the
> > VACUUM took over an hour to complete.
>
> Run "analyze verbose" on your db and see what it says about number of
> page slots needed versus used.  that will help you tell if you're
> vacuuming enough.
>
> How long vacuum takes isn't really that important.  What is important
> is how much of an impact it's having on the system.  there are
> several vacuum parameters in the postgresql.conf file that can lower
> the impact vacuum has on your system I/O wise while increasing its
> run time.
>
> Vacuum full is another story.  Think of it as a recovery tool, not a
> periodic maintenance tool.
>
> > it may, too, be because the tables use user-defined types heavily
> > and the original UPDATE involved a SELECT ... IN ... having a GROUP
> > BY with a few references to columns deep within user-defined types.
>
> Hard to say without a query and an explain analyze output.  It's
> common for user defined functions to produce estimates in the
> planner that are way off.  user defined types, not so much.  But the
> more complex the query the more likely it is that the query planner
> will make a bad estimate of the number of rows somewhere and choose
> a bad method.
>
> >  that wouldn't have been my choice, but, then, they were given to
> >  me to work, not my design.  in fact, PG is the first relational
> >  database implementation i've used that offered such things in a
> >  big way.
>
> Extensibility is quite a useful tool.
>
> > i also don't understand some other things, which are surprising,
> > like why some UPDATEs take so much longer when wrapped in a BEGIN
> > TRANSACTION- COMMIT than when having the transaction at a statement
> > level.
>
> that is strange.  I'd expect that maybe you've got something happening
> with the transaction waiting on other transactions, so that it's not
> so much running hard as just tapping its toe waiting for the other
> transaction to commit or roll back.

*** yes, i thought it was odd, too.  there wasn't anything else in that
transaction, and the table was set up for an experiment.  of course, the
experiment was one of those "UPDATE foo SET foo.x = 1 + foo.x WHERE
foo.y < k" things. ***
>
> > I come from an Oracle, DB2, Informix world, and in my experience
> > plans for queries are more stable.   i have loitered in and around
> > MySQL for a while.   i'm not surprised there's a learning curve with
> > PG.  i am surprised it breaks so marked with mainstay database
> > experience.
>
> Oh, I've seen Oracle get stupid due to lack of proper statistics as
> well.  You like had a goo

[GENERAL] time penalties on triggers?

2007-10-04 Thread Jan Theodore Galkowski
Does anyone know, or can anyone point to information about how much
triggers penalize inserts in PG tables?  I'm getting a report that it is
substantial, and before I investigate more.  The triggers in question
look like:
#
# CREATE OR REPLACE FUNCTION touch_lasttouched() RETURNS TRIGGER AS
$touch_lasttouched$
#   BEGIN
# NEW.lasttouched = NOW();
# RETURN NEW;
#   END;
# $touch_lasttouched$ LANGUAGE plpgsql;

#
# DROP TRIGGER IF EXISTS trigger_touch_ams_faults_data ON
ams_faults_and_failures_data;
# 
# CREATE TRIGGER trigger_touch_ams_faults_data
#   BEFORE INSERT OR UPDATE
#   ON ams_faults_and_failures_data
#   FOR EACH ROW
# EXECUTE PROCEDURE touch_lasttouched();
# 

Here the "lasttouched" column of a table is like:
#
#lasttouched TIMESTAMP(6)
#

and is intentionally left nullable.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq