[GENERAL] Alias for function return buffer in pl/pgsql?
Hello all, long time no chit-chat on the PG mailing list. We're upgrading from 8.0.3 to 8.3 and found that some stored procedures utilizing int_agg that we had left over from 7.3 had terrible performance. No problem, using ANY() we're able to regain that performance, more or less, and at the same time greatly simplify our stored procedures. But things can never be fast enough, can they? So I have a question or two. Here's my function for reference: CREATE OR REPLACE FUNCTION bg_nodes2descendants(INT[], INT) RETURNS SETOF connection_generation AS ' DECLARE _row connection_generation%ROWTYPE; _children INT[]; BEGIN -- this is faster than constructing in the loop below --_children = array(SELECT connectee_node_id FROM connections WHERE connection_type_id = 1 AND connector_node_id = ANY($1)); FOR _row IN SELECT connection_id, connection_type_id, connector_node_id, connector_node_type_id, connectee_node_id, connectee_node_type_id, current, timestamp, $2 + 1 FROM connections WHERE connection_type_id = 1 AND connector_node_id = ANY($1) LOOP _children := _children || _row.connectee_node_id; RETURN NEXT _row; END LOOP; IF FOUND THEN RETURN QUERY SELECT * FROM bg_nodes2descendants(_children, $2+1); END IF; RETURN; END ' LANGUAGE 'plpgsql'; So, my concern is alluded to in the comment above. When I use this function in places where it returns large results, building the _children array directly (in the commented out line) is about 25% faster. But I'd like to avoid building the children array altogether and would instead like to generate that array from the already collected output rows. For example, right before the recursive call, I'd like to select a column of the buffered output rows, cast it to an integer[], and pass it into the recursive call. Is there an internal value I can access for this such as: _children := array(SELECT connectee_node_id FROM $output); Bonus question - if I rewrite the first FOR loop as: RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id, connector_node_type_id, connectee_node_id, connectee_node_type_id, current, timestamp, $2 + 1 FROM connections WHERE connection_type_id = 1 AND connector_node_id = ANY($1); I get "ERROR: structure of query does not match function result type", even though the type signatures of the returned columns match the "connection_generation" rowtype. I am pretty sure this could be resolved by casting the resulting columns to that row type, but I am lost as to how the syntax to do such a thing would look. Thanks in advance for the help, and keep up the great work. PG8.3 is an amazing piece of software and it blows me away how much more advanced it gets with every release. Bart Grantham VP of R&D Logicworks Inc. - Complex and Managed Hosting
Re: [GENERAL] Alias for function return buffer in pl/pgsql?
Interesting. After some toying around I've figured out that this produces "ERROR: structure of query does not match function result type": RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id, connector_node_type_id, connectee_node_id, connectee_node_type_id, current, timestamp::timestamp, $2+1 FROM connections WHERE connection_type_id = 1 AND connector_node_id = ANY($1); Whereas this doesn't: RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id, connector_node_type_id, connectee_node_id, connectee_node_type_id, current, timestamp, $2+1 FROM connections WHERE connection_type_id = 1 AND connector_node_id = ANY($1); Is there something about timestamps that prevents plpgsql from seeing where casting is unnecessary? For the record, that field in the function's return type is definitely a timestamp, same for that column in the table. And I know it's bad form for me to have named the column "timestamp" when that's the name of the type. Oops. :) Thanks for the help, I've shaved about 30% off the query time now that I can use RETURN QUERY. BG -Original Message- From: Pavel Stehule [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 13, 2008 7:51 AM To: Bart Grantham Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Alias for function return buffer in pl/pgsql? > > > Bonus question - if I rewrite the first FOR loop as: > > > > RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id, > connector_node_type_id, connectee_node_id, > > connectee_node_type_id, current, timestamp, $2 > + 1 FROM connections > > WHERE connection_type_id = 1 AND connector_node_id = > ANY($1); you have to cast. This code works: postgres=# create type xxtp as (a integer, b varchar); CREATE TYPE Time: 6,458 ms postgres=# create table xx(a integer, b varchar); CREATE TABLE Time: 54,053 ms postgres=# insert into xx select 1, 'hhh'; INSERT 0 1 Time: 5,993 ms postgres=# insert into xx select 1, 'hhh'; INSERT 0 1 Time: 3,393 ms postgres=# insert into xx select 1, 'hhh'; INSERT 0 1 >postgres=# create or replace function x() returns setof xxtp as $$begin return >query select * from xx; return; end$$language plpgsql; CREATE FUNCTION Time: 4,392 ms postgres=# select * from x(); a | b ---+- 1 | hhh 1 | hhh 1 | hhh (3 rows) postgres=# create or replace function x() returns setof xxtp as $$begin return query select 1,'kkk'; return; end$$language plpgsql; CREATE FUNCTION Time: 4,577 ms postgres=# select * from x(); ERROR: structure of query does not match function result type CONTEXT: PL/pgSQL function "x" line 1 at RETURN QUERY postgres=# create or replace function x() returns setof xxtp as $$begin return query select 1,'kkk'::varchar; return; end$$language plpgsql; CREATE FUNCTION Time: 3,395 ms postgres=# select * from x(); a | b ---+- 1 | kkk (1 row) regards Pavel Stehule > > > I get "ERROR: structure of query does not match function result type", even > though the type signatures of the returned columns match the > "connection_generation" rowtype. I am pretty sure this could be resolved by > casting the resulting columns to that row type, but I am lost as to how the > syntax to do such a thing would look. > > > > Thanks in advance for the help, and keep up the great work. PG8.3 is an > amazing piece of software and it blows me away how much more advanced it > gets with every release. > > > > Bart Grantham > > VP of R&D > > Logicworks Inc. – Complex and Managed Hosting -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Alias for function return buffer in pl/pgsql?
Interesting. After some toying around I've figured out that this produces "ERROR: structure of query does not match function result type": RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id, connector_node_type_id, connectee_node_id, connectee_node_type_id, current, timestamp::timestamp, $2+1 FROM connections WHERE connection_type_id = 1 AND connector_node_id = ANY($1); Whereas this doesn't: RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id, connector_node_type_id, connectee_node_id, connectee_node_type_id, current, timestamp, $2+1 FROM connections WHERE connection_type_id = 1 AND connector_node_id = ANY($1); Is there something about timestamps that prevents plpgsql from seeing where casting is unnecessary? For the record, that field in the function's return type is definitely a timestamp, same for that column in the table. And I know it's bad form for me to have named the column "timestamp" when that's the name of the type. Oops. :) BG -Original Message- From: Pavel Stehule [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 13, 2008 7:51 AM To: Bart Grantham Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Alias for function return buffer in pl/pgsql? > > > Bonus question - if I rewrite the first FOR loop as: > > > > RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id, > connector_node_type_id, connectee_node_id, > > connectee_node_type_id, current, timestamp, $2 > + 1 FROM connections > > WHERE connection_type_id = 1 AND connector_node_id = > ANY($1); you have to cast. This code works: postgres=# create type xxtp as (a integer, b varchar); CREATE TYPE Time: 6,458 ms postgres=# create table xx(a integer, b varchar); CREATE TABLE Time: 54,053 ms postgres=# insert into xx select 1, 'hhh'; INSERT 0 1 Time: 5,993 ms postgres=# insert into xx select 1, 'hhh'; INSERT 0 1 Time: 3,393 ms postgres=# insert into xx select 1, 'hhh'; INSERT 0 1 >postgres=# create or replace function x() returns setof xxtp as $$begin return >query select * from xx; return; end$$language plpgsql; CREATE FUNCTION Time: 4,392 ms postgres=# select * from x(); a | b ---+- 1 | hhh 1 | hhh 1 | hhh (3 rows) postgres=# create or replace function x() returns setof xxtp as $$begin return query select 1,'kkk'; return; end$$language plpgsql; CREATE FUNCTION Time: 4,577 ms postgres=# select * from x(); ERROR: structure of query does not match function result type CONTEXT: PL/pgSQL function "x" line 1 at RETURN QUERY postgres=# create or replace function x() returns setof xxtp as $$begin return query select 1,'kkk'::varchar; return; end$$language plpgsql; CREATE FUNCTION Time: 3,395 ms postgres=# select * from x(); a | b ---+- 1 | kkk (1 row) regards Pavel Stehule > > > I get "ERROR: structure of query does not match function result type", even > though the type signatures of the returned columns match the > "connection_generation" rowtype. I am pretty sure this could be resolved by > casting the resulting columns to that row type, but I am lost as to how the > syntax to do such a thing would look. > > > > Thanks in advance for the help, and keep up the great work. PG8.3 is an > amazing piece of software and it blows me away how much more advanced it > gets with every release. > > > > Bart Grantham > > VP of R&D > > Logicworks Inc. – Complex and Managed Hosting -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Opteron vs. Xeon performance differences
Forgive me if this has been beaten into the ground, but my team and I couldn't find much conclusive study or posts on this issue. To make a long story short: we're experiencing Xeons as 50% slower than Opterons, even when the Xeon has twice as much cache and a slight clock speed advantage. The full story: we have an older production server with 2G of RAM, 2.4GHz Opterons w/ 1M of cache. The database is not large, only around 7M or 8M rows altogether, 2.5G on disk. Most queries are reads, probably on a 10:1 proportion with writes. In the process of upgrading this server to a pair of DRBD-mirrored (more on this below) servers we discovered that the new servers were actually slower than the older one. The newer servers have 4G of RAM, 3.0GHz Xeons with 2M of cache. And not just a little slower, but queries (simple, complex, and disgusting recursive stored procedures) routinely run in 50-100% more time than they did on the older server. After many troubleshooting techniques (downgrading the kernel to that of the older machine, verifying version parity, copying the binary from the older server, building a 32bit binary on the new servers, running the entire database out of a ramdisk, and of course much tweaking of postgresql.conf) and seeing virtually no benefit from any of these tests I finally took the final leap: just pull the disks and throw them in a newer Opteron chassis (2.8GHz, 1M cache). And whaddya know? It's got a 20% speed edge on the older Opteron, and blows away the performance of the newer Xeons. One of my guys did some testing and it appears that LWLockAquire and LWLockRelease are the culprits, but we're not entirely confident of our conclusion. Any thoughts on why this might be so different between the two architectures? We're a hosting provider so we've got some spare equipment to work with and I'm going to request that we keep these two boxes up for a week or so. Are there any other tests that you guys can suggest that would help get down to the bottom of this? I figure that not everyone has access to as much gear as we do so it might be a good opportunity to get some A/B testing on a production database on identical OS/server installs on different hardware. I'm content to just say "Well, we use Opterons then!", but I imagine that if we could help bring equal performance to Xeon users that it would be worth the effort of volunteering. To be clear, I have two machines sitting on the network ready for tweaking, one is a Xeon, the other is an Opteron, neither is in production and both can be fully mangled in the interest of figuring this out. Speaking of being a hosting provider, I may as well take a moment to point out that we are working with DRBD for mirroring and have found it works beautifully with PG (MySQL as well). Also, while our "Managed Database Service" product is geared around MySQL, Oracle, and MSSQL, we're pretty familiar with PG and would be happy to talk to anyone about hosting needs they may have. Thanks for listening, and again please let me know if there is further testing we can do to help get to the bottom of this Opteron/Xeon performance discrepancy. Bart Grantham VP of R&D Logicworks, Inc. www.logicworks.net
[GENERAL] Performance difference between ANY and IN, also array syntax
Hello, all. I work for a Manhattan ISP and have developed an internal systems management/housekeeping app on php/postgres 7.4. I am trying to speed up some bits with stored procedures and have had great success, except I've now run into a bit of trouble. It comes down to this: # SELECT * FROM connections WHERE connectee_node_id = ANY ( ARRAY[28543,28542] ); -snip- Time: 564.899 ms ...versus... # SELECT * FROM connections WHERE connectee_node_id IN ( 28543,28542 ); -snip- Time: 1.410 ms Why the difference? I tried explain: # explain SELECT * FROM connections WHERE connectee_node_id = ANY ( ARRAY[28543,28542] ); QUERY PLAN - Seq Scan on connections (cost=0.00..17963.44 rows=207264 width=33) Filter: (connectee_node_id = ANY ('{28543,28542}'::integer[])) ..versus... # explain SELECT * FROM connections WHERE connectee_node_id IN ( 28543,28542 ); QUERY PLAN --- Index Scan using c_connectee_node_id, c_connectee_node_id on connections (cost=0.00..67.28 rows=72 width=33) Index Cond: ((connectee_node_id = 28543) OR (connectee_node_id = 28542)) Why filter for one and index for the other? Is it because an array is mutable, so it's impossible to operate on it the same way? I need to pass in an array to my stored procedure, but having to use ANY in my select is killing the performance. I'd like to know what I can do to make ANY perform like IN, or alternatively, could someone fill me in on the syntax on how to cast an INT[] to a "list of scalar expressions", which the manual states is the right-hand side to IN. Also, I have a few bits of confusion about array syntax that perhaps someone could illuminate for me. Forgive me, I'm not as fluent in SQL as other languages. First, this doesn't work for me: RAISE NOTICE ''DEBUG: %'', _myarray[1]; It seems pretty reasonable to me, but it gives me a 'syntax error at or near "["'. Next, I can't seem to declare an array of a user-defined row: _innerrow my_type%ROWTYPE[]; Is there a syntactical snag I'm tripping over? Or can you not have arrays of other than built-in types? I think it's super-cool that you can extract arbitrary rectangles of data from a multi-dimentional array, but can you take a vertical slice from an array of user-defined type by column? For example: _mytype[1:5].some_column And finally, how do you specifcy an entire array index when doing columns from multi-dim arrays? Is there something like _my_multidim_of_ints[*][4] or maybe _my_multidim_of_ints[:][4] ? Thanks for the help, and thanks for the great database. Bart G Logicworks NOC ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[GENERAL] Array manipulation/syntax question
Hello again. I had a problem a few weeks ago with using IN ( some_array ) having really rough performance. Special thanks to Ron Mayer for the suggestion of using int_array_enum(some_array) to join against. I had to upgrade to PG8 but that technique works really well. Now I have a question about array manipulation. Specifically, how can I build an array from the result of a query without resorting to array_append()? I have a recursive function that passes itself an array each time and I want to build this array by taking a column out of a query (which is a join from the input array and a table... see? recursive!). This becomes a performance issue since the query tha wraps array_append could be returning many thousands of rows. Hmm, my question is sounding more complicated it should. Let me put it this way... how do I do this: -- my_array is an INT[] _my_array := select some_column from some_table; ... in plpgsql? Thanks in advance. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[GENERAL] Odd performance issue
Summary: depending on the value, the planner will sometimes choose a seq scan, sometimes an index scan. The former produces terrible performace, the latter great performance. The long story: we had a disk failure (NOT the disk the db was on) and the machine's system disk had to be rebuilt from the raid array and re-GRUB'ed. Now that the the system is back up we are seeing terrible performance (or more accurately, wildly varying performance). I've tried re-importing the data from the live system (this is new hardware under testing for the system) and re-initing the db cluster. A specific example is probably best. This 'connections' table has about 922K rows. The difference here is node_id's 28542 vs. 28560. Using 28542 causes an index scan, 28560 causes a seq scan: The details: logicops2=> explain SELECT * from connections AS c LEFT JOIN connection_types AS ct ON ct.connection_type_id = c.connection_type_id WHERE ( connector_node_id = 28542 OR connectee_node_id = 28542 ) AND ( c.connection_type_id < 1000 ) LIMIT 300; QUERY PLAN --- Limit (cost=1.29..563.05 rows=203 width=116) -> Hash Left Join (cost=1.29..563.05 rows=203 width=116) Hash Cond: ("outer".connection_type_id = "inner".connection_type_id) -> Index Scan using c_connector_node_id, c_connectee_node_id on connections c (cost=0.00..558.72 rows=203 width=33) Index Cond: ((connector_node_id = 28542) OR (connectee_node_id = 28542)) Filter: (connection_type_id < 1000) -> Hash (cost=1.23..1.23 rows=23 width=83) -> Seq Scan on connection_types ct (cost=0.00..1.23 rows=23 width=83) (8 rows) Time: 0.935 ms logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS ct ON ct.connection_type_id = c.connection_type_id WHERE ( connector_node_id = 28542 OR connectee_node_id = 28542 ) AND ( c.connection_type_id < 1000 ) LIMIT 300; ...results... (12 rows) Time: 1.887 ms -vs- logicops2=> explain SELECT * from connections AS c LEFT JOIN connection_types AS ct ON ct.connection_type_id = c.connection_type_id WHERE ( connector_node_id = 28560 OR connectee_node_id = 28560 ) AND ( c.connection_type_id < 1000 ) LIMIT 300; QUERY PLAN -- Limit (cost=1.29..686.09 rows=300 width=116) -> Hash Left Join (cost=1.29..24939.39 rows=10925 width=116) Hash Cond: ("outer".connection_type_id = "inner".connection_type_id) -> Seq Scan on connections c (cost=0.00..24774.23 rows=10925 width=33) Filter: (((connector_node_id = 28560) OR (connectee_node_id = 28560)) AND (connection_type_id < 1000)) -> Hash (cost=1.23..1.23 rows=23 width=83) -> Seq Scan on connection_types ct (cost=0.00..1.23 rows=23 width=83) (7 rows) Time: 0.704 ms logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS ct ON ct.connection_type_id = c.connection_type_id WHERE ( connector_node_id = 28560 OR connectee_node_id = 28560 ) AND ( c.connection_type_id < 1000 ) LIMIT 300; ...results... (7 rows) Time: 578.597 ms ... it may be relevant that one node_id has 15 times as many connections: logicops2=> select count(*) from connections where connector_node_id = 28542 OR connectee_node_id = 28542; count --- 856 (1 row) Time: 1.424 ms logicops2=> select count(*) from connections where connector_node_id = 28560 OR connectee_node_id = 28560; count --- 13500 (1 row) Time: 559.696 ms ... but that shouldn't make a difference to the planner, should it? Yes, I've vacuum analyzed. Also, I was wondering if someone could correct me on a bit of array syntax. I'd like to have a query pass back an array of ints to a function call. Something like this: logicops2=> select * from nodes2ancestors(array[(select node_id from nodes where node_type_id = 3)]::int[], 0); ERROR: more than one row returned by a subquery used as an expression Thanks for any help/pointers you guys can provide. I really appreciate it as I'm down to the wire on a project and this performance thing has really blindsided us. Bart ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql Mention-- Logicworks' LogicOps Management
That's me and my project listed there. Glad to give PG the good press it deserves, it's the least I could do. It's been a great db for us, and we're looking forward to 8.1. And thanks for the help that I received on the mailing list a few months back that directly impacted this project. It was a real life-saver. BG Reid Thompson wrote: http://linux.sys-con.com/read/139427_2.htm reid ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org