Re: [PERFORM] Query slow as function

2012-02-20 Thread Merlin Moncure
On Sat, Feb 18, 2012 at 8:50 AM, Steve Horn  wrote:
> Hello all!
>
> I have a very simple query that I am trying to wrap into a function:
>
> SELECT gs.geo_shape_id AS gid,
> gs.geocode
> FROM geo_shapes gs
> WHERE gs.geocode = 'xyz'
> AND geo_type = 1
> GROUP BY gs.geography, gs.geo_shape_id, gs.geocode;
>
> This query runs in about 10 milliseconds.
>
> Now my goal is to wrap the query in a function:
>
> I create a return type:
> CREATE TYPE geocode_carrier_route_by_geocode_result AS
>    (gid integer,
>     geocode character varying(9));
> ALTER TYPE geocode_carrier_route_by_geocode_result
>   OWNER TO root;
>
> ..and the function
> CREATE OR REPLACE FUNCTION geocode_carrier_route_by_geocode(geo_code
> character(9))
>   RETURNS SETOF geocode_carrier_route_by_geocode_result AS
> $BODY$
>
> BEGIN
>
> RETURN QUERY EXECUTE
> 'SELECT gs.geo_shape_id AS gid,
> gs.geocode
> FROM geo_shapes gs
> WHERE gs.geocode = $1
> AND geo_type = 1
> GROUP BY gs.geography, gs.geo_shape_id, gs.geocode'
> USING geo_code;
>
> END;
>
> $BODY$
>   LANGUAGE plpgsql STABLE;
> ALTER FUNCTION geocode_carrier_route_by_geocode(character)
>   OWNER TO root;
>
> Execute the function: select * from geocode_carrier_route_by_geocode('xyz');
>
> This query takes 500 milliseconds to run. My question of course is why?
>
> Related: If I create a function and assign LANGUAGE 'sql', my function runs
> in the expected 10 milliseconds. Is there some overhead to using the plpgsql
> language?
>
> Thanks for any help in clarifying my understanding!


not overhead.  it's how the plans are generated.  plpgsql builds out
the query plan and caches it.  sql language function replan the query
on every execution. caching the plan can help or hurt depending on how
sensitive the plan is to the supplied parameters -- plpgsql can't (and
shouldn't) use the actual parameter value when generating the plan.
OTOH, for very long functions especially the amount of time spent in
plan generation can really add up so plpgsql can often be faster than
vanilla sql.

to force plpgsql smarter plans, you can maybe attempt 'SET LOCAL
enable_xxx' planner directives.  pretty hacky, but maybe might help in
your case.  also better statistics might help.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Insertions slower than Updates?

2012-02-20 Thread Ofer Israeli
Hi all,

In performance testing we're doing we are currently running two scenarios:

 1.  Starting from an empty db, therefore all operations are INSERTs.
 2.  Starting from an existing db - thereby UPDATing all records.

I should also mention that the tables we are dealing with are heavily indexed.

I would expect that the first scenario would give much better results than the 
second one as:

 1.  INSERT should be cheaper than UPDATE due to only dealing with one record 
instead of two.
 2.  We also have SELECT queries before the operation and in the first 
configuration, the SELECTs will be dealing with much less data for most of the 
run.

To our surprise, we see that the second scenario gives better results with an 
average processing time of an event at around %70 of the time run in the first 
scenario.

Anyone have any ideas on why the empty db is giving worse results??

Many Thanks,
Ofer


Re: [PERFORM] Insertions slower than Updates?

2012-02-20 Thread Kevin Grittner
Ofer Israeli  wrote:
 
> INSERT should be cheaper than UPDATE due to only dealing with one 
> record instead of two.
 
... unless the UPDATE is a HOT update, in which case the indexes
don't need to be touched.
 
> Anyone have any ideas on why the empty db is giving worse
> results??
 
Besides the HOT updates being fast, there is the issue of having
space already allocated and ready for the database to use, rather
than needing to make calls to the OS to create and extend files as
space is needed.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Insertions slower than Updates?

2012-02-20 Thread Ofer Israeli
Kevin Grittner wrote:
> Ofer Israeli  wrote:
>> Anyone have any ideas on why the empty db is giving worse results??
> 
> Besides the HOT updates being fast, there is the issue of having
> space already allocated and ready for the database to use, rather
> than needing to make calls to the OS to create and extend files as
> space is needed.   
> 

I thought about this direction as well, but on UPDATES, some of them will need 
to ask the OS for more space anyhow at least at the beginning of the run, 
additional pages will be needed.  Do you expect that the OS level allocations 
are so expensive as to show an ~%40 increase of processing time in average?


Thanks,
Ofer
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Insertions slower than Updates?

2012-02-20 Thread Kevin Grittner
Ofer Israeli  wrote:
> Kevin Grittner wrote:
>> Ofer Israeli  wrote:
>>> Anyone have any ideas on why the empty db is giving worse
>>> results??
>> 
>> Besides the HOT updates being fast, there is the issue of having
>> space already allocated and ready for the database to use, rather
>> than needing to make calls to the OS to create and extend files
>> as space is needed.   
> 
> I thought about this direction as well, but on UPDATES, some of
> them will need to ask the OS for more space anyhow at least at the
> beginning of the run, additional pages will be needed.  Do you
> expect that the OS level allocations are so expensive as to show
> an ~%40 increase of processing time in average?
 
Gut feel, 40% does seem high for just that; but HOT updates could
easily account for that, especially since you said that the tables
are "heavily indexed".  That is, as long as there are enough updates
which don't modify indexed columns.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Insertions slower than Updates?

2012-02-20 Thread Steve Horn
If the updates don't hit indexed columns (so the indexes don't need to be
rebuilt), then the update would be very fast.

Inserts would always affect the index causing it to constantly need
modifying.

If you're doing a lot of INSERTs in a batch operation, you may want to
consider dropping the indexes and recreating at the end.

On Mon, Feb 20, 2012 at 2:29 PM, Kevin Grittner  wrote:

> Ofer Israeli  wrote:
> > Kevin Grittner wrote:
> >> Ofer Israeli  wrote:
> >>> Anyone have any ideas on why the empty db is giving worse
> >>> results??
> >>
> >> Besides the HOT updates being fast, there is the issue of having
> >> space already allocated and ready for the database to use, rather
> >> than needing to make calls to the OS to create and extend files
> >> as space is needed.
> >
> > I thought about this direction as well, but on UPDATES, some of
> > them will need to ask the OS for more space anyhow at least at the
> > beginning of the run, additional pages will be needed.  Do you
> > expect that the OS level allocations are so expensive as to show
> > an ~%40 increase of processing time in average?
>
> Gut feel, 40% does seem high for just that; but HOT updates could
> easily account for that, especially since you said that the tables
> are "heavily indexed".  That is, as long as there are enough updates
> which don't modify indexed columns.
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Steve Horn
http://www.stevehorn.cc
st...@stevehorn.cc
http://twitter.com/stevehorn
740-503-2300


Re: [PERFORM] Insertions slower than Updates?

2012-02-20 Thread Ofer Israeli
Kevin Grittner wrote:
> Ofer Israeli  wrote:
>> Kevin Grittner wrote:
>>> Ofer Israeli  wrote:
 Anyone have any ideas on why the empty db is giving worse results??
>>> 
>>> Besides the HOT updates being fast, there is the issue of having
>>> space already allocated and ready for the database to use, rather
>>> than needing to make calls to the OS to create and extend files
>>> as space is needed.
>> 
>> I thought about this direction as well, but on UPDATES, some of them
>> will need to ask the OS for more space anyhow at least at the
>> beginning of the run, additional pages will be needed.  Do you expect
>> that the OS level allocations are so expensive as to show an ~%40
>> increase of processing time in average?
> 
> Gut feel, 40% does seem high for just that; but HOT updates could
> easily account for that, especially since you said that the tables
> are "heavily indexed".  That is, as long as there are enough updates
> which don't modify indexed columns.   

Most, if not all of our UPDATEs, involve updating an indexed column, so HOT 
updates are actually not performed at all :(
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Insertions slower than Updates?

2012-02-20 Thread Ofer Israeli
Ofer Israeli  wrote:

>Hi all,

> In performance testing we're doing we are currently running two scenarios:
>  1. Starting from an empty db, therefore all operations are INSERTs.
>  2. Starting from an existing db - thereby UPDATing all records.

> I should also mention that the tables we are dealing with are heavily indexed.

> I would expect that the first scenario would give much better results than 
> the second one as:
> 1. INSERT should be cheaper than UPDATE due to only dealing with one record 
> instead of two.
> 2.  We also have SELECT queries before the operation and in the first 
> configuration, the SELECTs will be dealing with much less data for most of 
> the run.

> To our surprise, we see that the second scenario gives better results with an 
> average processing time of an event at around %70 of the time run in the 
> first scenario.

> Anyone have any ideas on why the empty db is giving worse results??

A little googleing led me to this thought, will be happy to hear you're input 
on this.  If the database is initially empty, the analyzer will probably decide 
to query the tables by full table scan as opposed to index searching.  Now 
supposedly, during our test run, the analyzer does not run frequently enough 
and so at some point we are using the wrong method for SELECTs.

The version we are using is 8.3.7.

My questions are:
1. Does the above seem reasonable to you?
2. How often does the analyzer run?  To my understanding it will always run 
with autovacuum, is that right?  Is it triggered at other times as well?
3. Does the autoanalyzer work only partially on the db like autovacuum going to 
sleep after a certain amount of work was done or does it work until finished?  
If it is partial work, maybe it does not reach our relevant tables.  What 
dictates the order in which it will work?


Many thanks,
Ofer


Re: [PERFORM] Why so slow?

2012-02-20 Thread Alessandro Gagliardi
Ah, that did make a big difference! It went from taking 10x as long to
taking only 1.5x as long (about what I would have expected, if not
better.) Thank you!

On Fri, Feb 17, 2012 at 9:29 PM, Ants Aasma  wrote:

> On Feb 17, 2012 8:35 PM, "Alessandro Gagliardi" 
> wrote:
> > Here is the EXPLAIN: http://explain.depesz.com/s/ley
> >
> > I'm using PostgreSQL 9.0.6 on i486-pc-linux-gnu, compiled by GCC
> gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit
> >
> > My random_page_cost is 2 and yet it still insists on using Seq Scan on
> blocks.
>
> As could be inferred from the row counts, it's slow because its joining
> and then aggregating a quarter of the blocks table. The hash join with its
> sequential scan is probably the correct choice for that type of join, it's
> the join itself that should be optimized out. The optimizer doesn't figure
> out that the join can be turned into a semi join if the output is
> aggregated with distinct and is from only one of the tables (in this case,
> because the output is the join key, it can be from either table).
>
> To make the optimizers job easier you can rewrite it as a semi-join
> explicitly:
> SELECT DISTINCT(user_id) FROM seen_its WHERE EXISTS (SELECT 1 FROM blocks
> WHERE blocks.user_id = seen_its.user_id) AND seen_its.created BETWEEN
> (now()::date - interval '8 days')::timestamp AND now()::date::timestamp
>
> --
> Ants Aasma
>


[PERFORM] Indexes and Primary Keys on Rapidly Growing Tables

2012-02-20 Thread Alessandro Gagliardi
New question regarding this seen_its table: It gets over 100 inserts per
second. Probably many more if you include every time unique_violation occurs.
This flood of data is constant. The commits take too long (upwards of 100
ms, ten times slower than it needs to be!) What I'm wondering is if it
would be better to insert all of these rows into a separate table with no
constraints (call it daily_seen_its) and then batch insert them into a
table with something like: INSERT INTO seen_its SELECT user_id, moment_id,
MIN(created) FROM daily_seen_its GROUP BY user_id, moment_id WHERE created
BETWEEN 'yesterday' AND 'today'; the idea being that a table with no
constraints would be able to accept insertions much faster and then the
primary key could be enforced later. Even better would be if this could
happen hourly instead of daily. But first I just want to know if people
think that this might be a viable solution or if I'm barking up the wrong
tree.

Thanks!
-Alessandro

On Fri, Feb 17, 2012 at 10:34 AM, Alessandro Gagliardi
wrote:

> CREATE TABLE seen_its (
>   user_id character(24) NOT NULL,
>   moment_id character(24) NOT NULL,
>   created timestamp without time zone,
>   inserted timestamp without time zone DEFAULT now(),
>   CONSTRAINT seen_its_pkey PRIMARY KEY (user_id , moment_id )
> ) WITH ( OIDS=FALSE );
>
> CREATE INDEX seen_its_created_idx  ON seen_its  USING btree  (created );
>
> CREATE INDEX seen_its_user_id_idx  ON seen_its  USING btree  (user_id );
>
>


Re: [PERFORM] Indexes and Primary Keys on Rapidly Growing Tables

2012-02-20 Thread Josh Berkus
On 2/20/12 2:06 PM, Alessandro Gagliardi wrote:
> . But first I just want to know if people
> think that this might be a viable solution or if I'm barking up the wrong
> tree.

Batching is usually helpful for inserts, especially if there's a unique
key on a very large table involved.

I suggest also making the buffer table UNLOGGED, if you can afford to.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance