[GENERAL] Alias for function return buffer in pl/pgsql?

2008-08-13 Thread Bart Grantham
Hello all, long time no chit-chat on the PG mailing list.  We're upgrading from 
8.0.3 to 8.3 and found that some stored procedures utilizing int_agg that we 
had left over from 7.3 had terrible performance.  No problem, using ANY() we're 
able to regain that performance, more or less, and at the same time greatly 
simplify our stored procedures.  But things can never be fast enough, can they? 
 So I have a question or two.  Here's my function for reference:

CREATE OR REPLACE FUNCTION bg_nodes2descendants(INT[], INT) RETURNS SETOF 
connection_generation AS
'

DECLARE
_row connection_generation%ROWTYPE;
_children INT[];

BEGIN

-- this is faster than constructing in the loop below
--_children = array(SELECT connectee_node_id FROM connections WHERE 
connection_type_id = 1 AND connector_node_id = ANY($1));

FOR _row IN
SELECT connection_id, connection_type_id, connector_node_id, 
connector_node_type_id, connectee_node_id,
connectee_node_type_id, current, timestamp, $2 + 1
FROM connections WHERE connection_type_id = 1 AND connector_node_id = 
ANY($1)
LOOP
_children := _children || _row.connectee_node_id;
RETURN NEXT _row;
END LOOP;

IF FOUND THEN
RETURN QUERY SELECT * FROM bg_nodes2descendants(_children, $2+1);
END IF;

RETURN;
END

' LANGUAGE 'plpgsql';

 So, my concern is alluded to in the comment above.  When I use this 
function in places where it returns large results, building the _children array 
directly (in the commented out line) is about 25% faster.  But I'd like to 
avoid building the children array altogether and would instead like to generate 
that array from the already collected output rows.  For example, right before 
the recursive call, I'd like to select a column of the buffered output rows, 
cast it to an integer[], and pass it into the recursive call.  Is there an 
internal value I can access for this such as:

_children := array(SELECT connectee_node_id FROM $output);

Bonus question -  if I rewrite the first FOR loop as:

RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id, 
connector_node_type_id, connectee_node_id,
 connectee_node_type_id, current, timestamp, $2 + 1 
FROM connections
 WHERE connection_type_id = 1 AND connector_node_id = ANY($1);

I get "ERROR:  structure of query does not match function result type", even 
though the type signatures of the returned columns match the 
"connection_generation" rowtype.  I am pretty sure this could be resolved by 
casting the resulting columns to that row type, but I am lost as to how the 
syntax to do such a thing would look.

Thanks in advance for the help, and keep up the great work.  PG8.3 is an 
amazing piece of software and it blows me away how much more advanced it gets 
with every release.

Bart Grantham
VP of R&D
Logicworks Inc. - Complex and Managed Hosting


Re: [GENERAL] Alias for function return buffer in pl/pgsql?

2008-08-13 Thread Bart Grantham
Interesting.  After some toying around I've figured out that this produces 
"ERROR:  structure of query does not match function result type":

RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id,
connector_node_type_id, connectee_node_id, 
connectee_node_type_id,
current, timestamp::timestamp, $2+1
FROM connections WHERE connection_type_id = 1 AND 
connector_node_id = ANY($1);

Whereas this doesn't:

RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id,
connector_node_type_id, connectee_node_id, 
connectee_node_type_id,
current, timestamp, $2+1
FROM connections WHERE connection_type_id = 1 AND 
connector_node_id = ANY($1);

Is there something about timestamps that prevents plpgsql from seeing where 
casting is unnecessary?  For the record, that field in the function's return 
type is definitely a timestamp, same for that column in the table.  And I know 
it's bad form for me to have named the column "timestamp" when that's the name 
of the type.  Oops. :)

Thanks for the help, I've shaved about 30% off the query time now that I can 
use RETURN QUERY.

BG

-Original Message-
From: Pavel Stehule [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 13, 2008 7:51 AM
To: Bart Grantham
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Alias for function return buffer in pl/pgsql?

>
>
> Bonus question -  if I rewrite the first FOR loop as:
>
>
>
> RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id,
> connector_node_type_id, connectee_node_id,
>
>  connectee_node_type_id, current, timestamp, $2
> + 1 FROM connections
>
>  WHERE connection_type_id = 1 AND connector_node_id =
> ANY($1);

you have to cast. This code works:


postgres=# create type xxtp as (a integer, b varchar);
CREATE TYPE
Time: 6,458 ms
postgres=# create table xx(a integer, b varchar);
CREATE TABLE
Time: 54,053 ms
postgres=# insert into xx select 1, 'hhh';
INSERT 0 1
Time: 5,993 ms
postgres=# insert into xx select 1, 'hhh';
INSERT 0 1
Time: 3,393 ms
postgres=# insert into xx select 1, 'hhh';
INSERT 0 1


>postgres=# create or replace function x() returns setof xxtp as $$begin return 
>query select * from xx; return; end$$language plpgsql;
CREATE FUNCTION
Time: 4,392 ms
postgres=# select * from x();
 a |  b
---+-
 1 | hhh
 1 | hhh
 1 | hhh
(3 rows)
postgres=# create or replace function x() returns setof xxtp as
$$begin return query select 1,'kkk'; return; end$$language plpgsql;
CREATE FUNCTION
Time: 4,577 ms
postgres=# select * from x();
ERROR:  structure of query does not match function result type
CONTEXT:  PL/pgSQL function "x" line 1 at RETURN QUERY
postgres=# create or replace function x() returns setof xxtp as
$$begin return query select 1,'kkk'::varchar; return; end$$language
plpgsql;
CREATE FUNCTION
Time: 3,395 ms
postgres=# select * from x();
 a |  b
---+-
 1 | kkk
(1 row)

regards
Pavel Stehule
>
>
> I get "ERROR:  structure of query does not match function result type", even
> though the type signatures of the returned columns match the
> "connection_generation" rowtype.  I am pretty sure this could be resolved by
> casting the resulting columns to that row type, but I am lost as to how the
> syntax to do such a thing would look.
>
>
>
> Thanks in advance for the help, and keep up the great work.  PG8.3 is an
> amazing piece of software and it blows me away how much more advanced it
> gets with every release.
>
>
>
> Bart Grantham
>
> VP of R&D
>
> Logicworks Inc. – Complex and Managed Hosting

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alias for function return buffer in pl/pgsql?

2008-08-13 Thread Bart Grantham
Interesting.  After some toying around I've figured out that this produces 
"ERROR:  structure of query does not match function result type":

RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id,
connector_node_type_id, connectee_node_id, 
connectee_node_type_id,
current, timestamp::timestamp, $2+1
FROM connections WHERE connection_type_id = 1 AND 
connector_node_id = ANY($1);

Whereas this doesn't:

RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id,
connector_node_type_id, connectee_node_id, 
connectee_node_type_id,
current, timestamp, $2+1
FROM connections WHERE connection_type_id = 1 AND 
connector_node_id = ANY($1);

Is there something about timestamps that prevents plpgsql from seeing where 
casting is unnecessary?  For the record, that field in the function's return 
type is definitely a timestamp, same for that column in the table.  And I know 
it's bad form for me to have named the column "timestamp" when that's the name 
of the type.  Oops. :)

BG


-Original Message-
From: Pavel Stehule [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 13, 2008 7:51 AM
To: Bart Grantham
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Alias for function return buffer in pl/pgsql?

>
>
> Bonus question -  if I rewrite the first FOR loop as:
>
>
>
> RETURN QUERY SELECT connection_id, connection_type_id, connector_node_id,
> connector_node_type_id, connectee_node_id,
>
>  connectee_node_type_id, current, timestamp, $2
> + 1 FROM connections
>
>  WHERE connection_type_id = 1 AND connector_node_id =
> ANY($1);

you have to cast. This code works:


postgres=# create type xxtp as (a integer, b varchar);
CREATE TYPE
Time: 6,458 ms
postgres=# create table xx(a integer, b varchar);
CREATE TABLE
Time: 54,053 ms
postgres=# insert into xx select 1, 'hhh';
INSERT 0 1
Time: 5,993 ms
postgres=# insert into xx select 1, 'hhh';
INSERT 0 1
Time: 3,393 ms
postgres=# insert into xx select 1, 'hhh';
INSERT 0 1


>postgres=# create or replace function x() returns setof xxtp as $$begin return 
>query select * from xx; return; end$$language plpgsql;
CREATE FUNCTION
Time: 4,392 ms
postgres=# select * from x();
 a |  b
---+-
 1 | hhh
 1 | hhh
 1 | hhh
(3 rows)
postgres=# create or replace function x() returns setof xxtp as
$$begin return query select 1,'kkk'; return; end$$language plpgsql;
CREATE FUNCTION
Time: 4,577 ms
postgres=# select * from x();
ERROR:  structure of query does not match function result type
CONTEXT:  PL/pgSQL function "x" line 1 at RETURN QUERY
postgres=# create or replace function x() returns setof xxtp as
$$begin return query select 1,'kkk'::varchar; return; end$$language
plpgsql;
CREATE FUNCTION
Time: 3,395 ms
postgres=# select * from x();
 a |  b
---+-
 1 | kkk
(1 row)

regards
Pavel Stehule
>
>
> I get "ERROR:  structure of query does not match function result type", even
> though the type signatures of the returned columns match the
> "connection_generation" rowtype.  I am pretty sure this could be resolved by
> casting the resulting columns to that row type, but I am lost as to how the
> syntax to do such a thing would look.
>
>
>
> Thanks in advance for the help, and keep up the great work.  PG8.3 is an
> amazing piece of software and it blows me away how much more advanced it
> gets with every release.
>
>
>
> Bart Grantham
>
> VP of R&D
>
> Logicworks Inc. – Complex and Managed Hosting

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Opteron vs. Xeon performance differences

2008-10-10 Thread Bart Grantham
Forgive me if this has been beaten into the ground, but my team and I couldn't 
find much conclusive study or posts on this issue.  To make a long story short: 
we're experiencing Xeons as 50% slower than Opterons, even when the Xeon has 
twice as much cache and a slight clock speed advantage.

The full story: we have an older production server with 2G of RAM, 2.4GHz 
Opterons w/ 1M of cache.  The database is not large, only around 7M or 8M rows 
altogether, 2.5G on disk.  Most queries are reads, probably on a 10:1 
proportion with writes.  In the process of upgrading this server to a pair of 
DRBD-mirrored (more on this below) servers we discovered that the new servers 
were actually slower than the older one.  The newer servers have 4G of RAM, 
3.0GHz Xeons with 2M of cache.  And not just a little slower, but queries 
(simple, complex, and disgusting recursive stored procedures) routinely run in 
50-100% more time than they did on the older server.  After many 
troubleshooting techniques (downgrading the kernel to that of the older 
machine, verifying version parity, copying the binary from the older server, 
building a 32bit binary on the new servers, running the entire database out of 
a ramdisk, and of course much tweaking of postgresql.conf) and seeing virtually 
no benefit from any of these tests I finally took the final leap: just pull the 
disks and throw them in a newer Opteron chassis (2.8GHz, 1M cache).  And 
whaddya know?  It's got a 20% speed edge on the older Opteron, and blows away 
the performance of the newer Xeons.

One of my guys did some testing and it appears that LWLockAquire and 
LWLockRelease are the culprits, but we're not entirely confident of our 
conclusion.  Any thoughts on why this might be so different between the two 
architectures?  We're a hosting provider so we've got some spare equipment to 
work with and I'm going to request that we keep these two boxes up for a week 
or so.  Are there any other tests that you guys can suggest that would help get 
down to the bottom of this?  I figure that not everyone has access to as much 
gear as we do so it might be a good opportunity to get some A/B testing on a 
production database on identical OS/server installs on different hardware.  I'm 
content to just say "Well, we use Opterons then!", but I imagine that if we 
could help bring equal performance to Xeon users that it would be worth the 
effort of volunteering.  To be clear, I have two machines sitting on the 
network ready for tweaking, one is a Xeon, the other is an Opteron, neither is 
in production and both can be fully mangled in the interest of figuring this 
out.

Speaking of being a hosting provider, I may as well take a moment to point out 
that we are working with DRBD for mirroring and have found it works beautifully 
with PG (MySQL as well).  Also, while our "Managed Database Service" product is 
geared around MySQL, Oracle, and MSSQL, we're pretty familiar with PG and would 
be happy to talk to anyone about hosting needs they may have.

Thanks for listening, and again please let me know if there is further testing 
we can do to help get to the bottom of this Opteron/Xeon performance 
discrepancy.

Bart Grantham
VP of R&D
Logicworks, Inc.
www.logicworks.net


[GENERAL] Performance difference between ANY and IN, also array syntax

2005-04-26 Thread Bart Grantham
Hello, all.  I work for a Manhattan ISP and have developed an internal 
systems management/housekeeping app on php/postgres 7.4.  I am trying to 
speed up some bits with stored procedures and have had great success, 
except I've now run into a bit of trouble.  It comes down to this:

# SELECT * FROM connections  WHERE connectee_node_id = ANY ( 
ARRAY[28543,28542] );
-snip-
Time: 564.899 ms

...versus...
# SELECT * FROM connections  WHERE connectee_node_id IN ( 28543,28542 );
-snip-
Time: 1.410 ms
Why the difference?  I tried explain:
# explain SELECT * FROM connections  WHERE connectee_node_id = ANY ( 
ARRAY[28543,28542] );
QUERY PLAN
-
Seq Scan on connections  (cost=0.00..17963.44 rows=207264 width=33)
  Filter: (connectee_node_id = ANY ('{28543,28542}'::integer[]))

..versus...
# explain SELECT * FROM connections  WHERE connectee_node_id IN ( 
28543,28542 );
 QUERY PLAN
---
Index Scan using c_connectee_node_id, c_connectee_node_id on 
connections  (cost=0.00..67.28 rows=72 width=33)
  Index Cond: ((connectee_node_id = 28543) OR (connectee_node_id = 28542))

Why filter for one and index for the other?  Is it because an array is 
mutable, so it's impossible to operate on it the same way?  I need to 
pass in an array to my stored procedure, but having to use ANY in my 
select is killing the performance. I'd like to know what I can do to 
make ANY perform like IN, or alternatively, could someone fill me in on 
the syntax on how to cast an INT[] to a "list of scalar expressions", 
which the manual states is the right-hand side to IN.

Also, I have a few bits of confusion about array syntax that perhaps 
someone could illuminate for me.  Forgive me, I'm not as fluent in SQL 
as other languages.

First, this doesn't work for me: RAISE NOTICE ''DEBUG: %'', _myarray[1];
It seems pretty reasonable to me, but it gives me a 'syntax error at or 
near "["'.

Next, I can't seem to declare an array of a user-defined row: _innerrow 
my_type%ROWTYPE[];
Is there a syntactical snag I'm tripping over?  Or can you not have 
arrays of other than built-in types?

I think it's super-cool that you can extract arbitrary rectangles of 
data from a multi-dimentional array, but can you take a vertical slice 
from an array of user-defined type by column?  For example: 
_mytype[1:5].some_column

And finally, how do you specifcy an entire array index when doing 
columns from multi-dim arrays?  Is there something like 
_my_multidim_of_ints[*][4] or maybe _my_multidim_of_ints[:][4] ?

Thanks for the help, and thanks for the great database.
Bart G
Logicworks NOC
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[GENERAL] Array manipulation/syntax question

2005-05-09 Thread Bart Grantham
Hello again.  I had a problem a few weeks ago with using IN ( some_array 
) having really rough performance.  Special thanks to Ron Mayer for the 
suggestion of using int_array_enum(some_array) to join against.  I had 
to upgrade to PG8 but that technique works really well.  Now I have a 
question about array manipulation.  Specifically, how can I build an 
array from the result of a query without resorting to array_append()?  I 
have a recursive function that passes itself an array each time and I 
want to build this array by taking a column out of a query (which is a 
join from the input array and a table... see?  recursive!).  This 
becomes a performance issue since the query tha wraps array_append could 
be returning many thousands of rows.  Hmm, my question is sounding more 
complicated it should.  Let me put it this way... how do I do this:

-- my_array is an INT[]
_my_array := select some_column from some_table;
... in plpgsql?
Thanks in advance.
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[GENERAL] Odd performance issue

2005-06-20 Thread Bart Grantham
Summary: depending on the value, the planner will sometimes choose a seq 
scan, sometimes an index scan.  The former produces terrible performace, 
the latter great performance.


The long story: we had a disk failure (NOT the disk the db was on) and 
the machine's system disk had to be rebuilt from the raid array and 
re-GRUB'ed.  Now that the the system is back up we are seeing terrible 
performance (or more accurately, wildly varying performance).  I've 
tried re-importing the data from the live system (this is new hardware 
under testing for the system) and re-initing the db cluster.  A specific 
example is probably best.  This 'connections' table has about 922K 
rows.  The difference here is node_id's 28542 vs. 28560.  Using 28542 
causes an index scan, 28560 causes a seq scan:


The details:
logicops2=> explain SELECT * from connections AS c LEFT JOIN 
connection_types AS ct ON ct.connection_type_id = c.connection_type_id 
WHERE ( connector_node_id = 28542 OR connectee_node_id = 28542 ) AND ( 
c.connection_type_id < 1000 ) LIMIT 300;

 QUERY PLAN
---
Limit  (cost=1.29..563.05 rows=203 width=116)
  ->  Hash Left Join  (cost=1.29..563.05 rows=203 width=116)
Hash Cond: ("outer".connection_type_id = 
"inner".connection_type_id)
->  Index Scan using c_connector_node_id, c_connectee_node_id 
on connections c  (cost=0.00..558.72 rows=203 width=33)
  Index Cond: ((connector_node_id = 28542) OR 
(connectee_node_id = 28542))

  Filter: (connection_type_id < 1000)
->  Hash  (cost=1.23..1.23 rows=23 width=83)
  ->  Seq Scan on connection_types ct  (cost=0.00..1.23 
rows=23 width=83)

(8 rows)

Time: 0.935 ms
logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS 
ct ON ct.connection_type_id = c.connection_type_id WHERE ( 
connector_node_id = 28542 OR connectee_node_id = 28542 ) AND ( 
c.connection_type_id < 1000 ) LIMIT 300;

...results...
(12 rows)

Time: 1.887 ms

-vs-

logicops2=> explain SELECT * from connections AS c LEFT JOIN 
connection_types AS ct ON ct.connection_type_id = c.connection_type_id 
WHERE ( connector_node_id = 28560 OR connectee_node_id = 28560 ) AND ( 
c.connection_type_id < 1000 ) LIMIT 300;

 QUERY PLAN
--
Limit  (cost=1.29..686.09 rows=300 width=116)
  ->  Hash Left Join  (cost=1.29..24939.39 rows=10925 width=116)
Hash Cond: ("outer".connection_type_id = 
"inner".connection_type_id)
->  Seq Scan on connections c  (cost=0.00..24774.23 rows=10925 
width=33)
  Filter: (((connector_node_id = 28560) OR 
(connectee_node_id = 28560)) AND (connection_type_id < 1000))

->  Hash  (cost=1.23..1.23 rows=23 width=83)
  ->  Seq Scan on connection_types ct  (cost=0.00..1.23 
rows=23 width=83)

(7 rows)

Time: 0.704 ms
logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS 
ct ON ct.connection_type_id = c.connection_type_id WHERE ( 
connector_node_id = 28560 OR connectee_node_id = 28560 ) AND ( 
c.connection_type_id < 1000 ) LIMIT 300;

...results...
(7 rows)

Time: 578.597 ms


... it may be relevant that one node_id has 15 times as many connections:

logicops2=> select count(*) from connections where connector_node_id = 
28542 OR connectee_node_id = 28542;

count
---
  856
(1 row)

Time: 1.424 ms
logicops2=> select count(*) from connections where connector_node_id = 
28560 OR connectee_node_id = 28560;

count
---
13500
(1 row)

Time: 559.696 ms

... but that shouldn't make a difference to the planner, should it?  
Yes, I've vacuum analyzed.



Also, I was wondering if someone could correct me on a bit of array 
syntax.  I'd like to have a query pass back an array of ints to a 
function call.  Something like this:
logicops2=> select * from nodes2ancestors(array[(select node_id from 
nodes where node_type_id = 3)]::int[], 0);

ERROR:  more than one row returned by a subquery used as an expression

Thanks for any help/pointers you guys can provide.  I really appreciate 
it as I'm down to the wire on a project and this performance thing has 
really blindsided us.


Bart

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] Postgresql Mention-- Logicworks' LogicOps Management

2005-10-19 Thread Bart Grantham
That's me and my project listed there.  Glad to give PG the good press 
it deserves, it's the least I could do.  It's been a great db for us, 
and we're looking forward to 8.1.


And thanks for the help that I received on the mailing list a few months 
back that directly impacted this project.  It was a real life-saver.


BG

Reid Thompson wrote:


http://linux.sys-con.com/read/139427_2.htm


reid


---(end of broadcast)---
TIP 6: explain analyze is your friend

 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org