Re: [PERFORM] Query slow as function
On Sat, Feb 18, 2012 at 8:50 AM, Steve Horn wrote: > Hello all! > > I have a very simple query that I am trying to wrap into a function: > > SELECT gs.geo_shape_id AS gid, > gs.geocode > FROM geo_shapes gs > WHERE gs.geocode = 'xyz' > AND geo_type = 1 > GROUP BY gs.geography, gs.geo_shape_id, gs.geocode; > > This query runs in about 10 milliseconds. > > Now my goal is to wrap the query in a function: > > I create a return type: > CREATE TYPE geocode_carrier_route_by_geocode_result AS > (gid integer, > geocode character varying(9)); > ALTER TYPE geocode_carrier_route_by_geocode_result > OWNER TO root; > > ..and the function > CREATE OR REPLACE FUNCTION geocode_carrier_route_by_geocode(geo_code > character(9)) > RETURNS SETOF geocode_carrier_route_by_geocode_result AS > $BODY$ > > BEGIN > > RETURN QUERY EXECUTE > 'SELECT gs.geo_shape_id AS gid, > gs.geocode > FROM geo_shapes gs > WHERE gs.geocode = $1 > AND geo_type = 1 > GROUP BY gs.geography, gs.geo_shape_id, gs.geocode' > USING geo_code; > > END; > > $BODY$ > LANGUAGE plpgsql STABLE; > ALTER FUNCTION geocode_carrier_route_by_geocode(character) > OWNER TO root; > > Execute the function: select * from geocode_carrier_route_by_geocode('xyz'); > > This query takes 500 milliseconds to run. My question of course is why? > > Related: If I create a function and assign LANGUAGE 'sql', my function runs > in the expected 10 milliseconds. Is there some overhead to using the plpgsql > language? > > Thanks for any help in clarifying my understanding! not overhead. it's how the plans are generated. plpgsql builds out the query plan and caches it. sql language function replan the query on every execution. caching the plan can help or hurt depending on how sensitive the plan is to the supplied parameters -- plpgsql can't (and shouldn't) use the actual parameter value when generating the plan. OTOH, for very long functions especially the amount of time spent in plan generation can really add up so plpgsql can often be faster than vanilla sql. to force plpgsql smarter plans, you can maybe attempt 'SET LOCAL enable_xxx' planner directives. pretty hacky, but maybe might help in your case. also better statistics might help. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Insertions slower than Updates?
Hi all, In performance testing we're doing we are currently running two scenarios: 1. Starting from an empty db, therefore all operations are INSERTs. 2. Starting from an existing db - thereby UPDATing all records. I should also mention that the tables we are dealing with are heavily indexed. I would expect that the first scenario would give much better results than the second one as: 1. INSERT should be cheaper than UPDATE due to only dealing with one record instead of two. 2. We also have SELECT queries before the operation and in the first configuration, the SELECTs will be dealing with much less data for most of the run. To our surprise, we see that the second scenario gives better results with an average processing time of an event at around %70 of the time run in the first scenario. Anyone have any ideas on why the empty db is giving worse results?? Many Thanks, Ofer
Re: [PERFORM] Insertions slower than Updates?
Ofer Israeli wrote: > INSERT should be cheaper than UPDATE due to only dealing with one > record instead of two. ... unless the UPDATE is a HOT update, in which case the indexes don't need to be touched. > Anyone have any ideas on why the empty db is giving worse > results?? Besides the HOT updates being fast, there is the issue of having space already allocated and ready for the database to use, rather than needing to make calls to the OS to create and extend files as space is needed. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Insertions slower than Updates?
Kevin Grittner wrote: > Ofer Israeli wrote: >> Anyone have any ideas on why the empty db is giving worse results?? > > Besides the HOT updates being fast, there is the issue of having > space already allocated and ready for the database to use, rather > than needing to make calls to the OS to create and extend files as > space is needed. > I thought about this direction as well, but on UPDATES, some of them will need to ask the OS for more space anyhow at least at the beginning of the run, additional pages will be needed. Do you expect that the OS level allocations are so expensive as to show an ~%40 increase of processing time in average? Thanks, Ofer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Insertions slower than Updates?
Ofer Israeli wrote: > Kevin Grittner wrote: >> Ofer Israeli wrote: >>> Anyone have any ideas on why the empty db is giving worse >>> results?? >> >> Besides the HOT updates being fast, there is the issue of having >> space already allocated and ready for the database to use, rather >> than needing to make calls to the OS to create and extend files >> as space is needed. > > I thought about this direction as well, but on UPDATES, some of > them will need to ask the OS for more space anyhow at least at the > beginning of the run, additional pages will be needed. Do you > expect that the OS level allocations are so expensive as to show > an ~%40 increase of processing time in average? Gut feel, 40% does seem high for just that; but HOT updates could easily account for that, especially since you said that the tables are "heavily indexed". That is, as long as there are enough updates which don't modify indexed columns. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Insertions slower than Updates?
If the updates don't hit indexed columns (so the indexes don't need to be rebuilt), then the update would be very fast. Inserts would always affect the index causing it to constantly need modifying. If you're doing a lot of INSERTs in a batch operation, you may want to consider dropping the indexes and recreating at the end. On Mon, Feb 20, 2012 at 2:29 PM, Kevin Grittner wrote: > Ofer Israeli wrote: > > Kevin Grittner wrote: > >> Ofer Israeli wrote: > >>> Anyone have any ideas on why the empty db is giving worse > >>> results?? > >> > >> Besides the HOT updates being fast, there is the issue of having > >> space already allocated and ready for the database to use, rather > >> than needing to make calls to the OS to create and extend files > >> as space is needed. > > > > I thought about this direction as well, but on UPDATES, some of > > them will need to ask the OS for more space anyhow at least at the > > beginning of the run, additional pages will be needed. Do you > > expect that the OS level allocations are so expensive as to show > > an ~%40 increase of processing time in average? > > Gut feel, 40% does seem high for just that; but HOT updates could > easily account for that, especially since you said that the tables > are "heavily indexed". That is, as long as there are enough updates > which don't modify indexed columns. > > -Kevin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Steve Horn http://www.stevehorn.cc st...@stevehorn.cc http://twitter.com/stevehorn 740-503-2300
Re: [PERFORM] Insertions slower than Updates?
Kevin Grittner wrote: > Ofer Israeli wrote: >> Kevin Grittner wrote: >>> Ofer Israeli wrote: Anyone have any ideas on why the empty db is giving worse results?? >>> >>> Besides the HOT updates being fast, there is the issue of having >>> space already allocated and ready for the database to use, rather >>> than needing to make calls to the OS to create and extend files >>> as space is needed. >> >> I thought about this direction as well, but on UPDATES, some of them >> will need to ask the OS for more space anyhow at least at the >> beginning of the run, additional pages will be needed. Do you expect >> that the OS level allocations are so expensive as to show an ~%40 >> increase of processing time in average? > > Gut feel, 40% does seem high for just that; but HOT updates could > easily account for that, especially since you said that the tables > are "heavily indexed". That is, as long as there are enough updates > which don't modify indexed columns. Most, if not all of our UPDATEs, involve updating an indexed column, so HOT updates are actually not performed at all :( -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Insertions slower than Updates?
Ofer Israeli wrote: >Hi all, > In performance testing we're doing we are currently running two scenarios: > 1. Starting from an empty db, therefore all operations are INSERTs. > 2. Starting from an existing db - thereby UPDATing all records. > I should also mention that the tables we are dealing with are heavily indexed. > I would expect that the first scenario would give much better results than > the second one as: > 1. INSERT should be cheaper than UPDATE due to only dealing with one record > instead of two. > 2. We also have SELECT queries before the operation and in the first > configuration, the SELECTs will be dealing with much less data for most of > the run. > To our surprise, we see that the second scenario gives better results with an > average processing time of an event at around %70 of the time run in the > first scenario. > Anyone have any ideas on why the empty db is giving worse results?? A little googleing led me to this thought, will be happy to hear you're input on this. If the database is initially empty, the analyzer will probably decide to query the tables by full table scan as opposed to index searching. Now supposedly, during our test run, the analyzer does not run frequently enough and so at some point we are using the wrong method for SELECTs. The version we are using is 8.3.7. My questions are: 1. Does the above seem reasonable to you? 2. How often does the analyzer run? To my understanding it will always run with autovacuum, is that right? Is it triggered at other times as well? 3. Does the autoanalyzer work only partially on the db like autovacuum going to sleep after a certain amount of work was done or does it work until finished? If it is partial work, maybe it does not reach our relevant tables. What dictates the order in which it will work? Many thanks, Ofer
Re: [PERFORM] Why so slow?
Ah, that did make a big difference! It went from taking 10x as long to taking only 1.5x as long (about what I would have expected, if not better.) Thank you! On Fri, Feb 17, 2012 at 9:29 PM, Ants Aasma wrote: > On Feb 17, 2012 8:35 PM, "Alessandro Gagliardi" > wrote: > > Here is the EXPLAIN: http://explain.depesz.com/s/ley > > > > I'm using PostgreSQL 9.0.6 on i486-pc-linux-gnu, compiled by GCC > gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit > > > > My random_page_cost is 2 and yet it still insists on using Seq Scan on > blocks. > > As could be inferred from the row counts, it's slow because its joining > and then aggregating a quarter of the blocks table. The hash join with its > sequential scan is probably the correct choice for that type of join, it's > the join itself that should be optimized out. The optimizer doesn't figure > out that the join can be turned into a semi join if the output is > aggregated with distinct and is from only one of the tables (in this case, > because the output is the join key, it can be from either table). > > To make the optimizers job easier you can rewrite it as a semi-join > explicitly: > SELECT DISTINCT(user_id) FROM seen_its WHERE EXISTS (SELECT 1 FROM blocks > WHERE blocks.user_id = seen_its.user_id) AND seen_its.created BETWEEN > (now()::date - interval '8 days')::timestamp AND now()::date::timestamp > > -- > Ants Aasma >
[PERFORM] Indexes and Primary Keys on Rapidly Growing Tables
New question regarding this seen_its table: It gets over 100 inserts per second. Probably many more if you include every time unique_violation occurs. This flood of data is constant. The commits take too long (upwards of 100 ms, ten times slower than it needs to be!) What I'm wondering is if it would be better to insert all of these rows into a separate table with no constraints (call it daily_seen_its) and then batch insert them into a table with something like: INSERT INTO seen_its SELECT user_id, moment_id, MIN(created) FROM daily_seen_its GROUP BY user_id, moment_id WHERE created BETWEEN 'yesterday' AND 'today'; the idea being that a table with no constraints would be able to accept insertions much faster and then the primary key could be enforced later. Even better would be if this could happen hourly instead of daily. But first I just want to know if people think that this might be a viable solution or if I'm barking up the wrong tree. Thanks! -Alessandro On Fri, Feb 17, 2012 at 10:34 AM, Alessandro Gagliardi wrote: > CREATE TABLE seen_its ( > user_id character(24) NOT NULL, > moment_id character(24) NOT NULL, > created timestamp without time zone, > inserted timestamp without time zone DEFAULT now(), > CONSTRAINT seen_its_pkey PRIMARY KEY (user_id , moment_id ) > ) WITH ( OIDS=FALSE ); > > CREATE INDEX seen_its_created_idx ON seen_its USING btree (created ); > > CREATE INDEX seen_its_user_id_idx ON seen_its USING btree (user_id ); > >
Re: [PERFORM] Indexes and Primary Keys on Rapidly Growing Tables
On 2/20/12 2:06 PM, Alessandro Gagliardi wrote: > . But first I just want to know if people > think that this might be a viable solution or if I'm barking up the wrong > tree. Batching is usually helpful for inserts, especially if there's a unique key on a very large table involved. I suggest also making the buffer table UNLOGGED, if you can afford to. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance