Re: [SQL] aggregate query

2007-05-29 Thread Andrew Kroeger
Raj A wrote:
> I have a table
> 
> CREATE TABLE survey_load
> (
>  meter_id character(5) NOT NULL,
>  number_of_bays integer NOT NULL,
>  bay_1_use integer,
>  bay_2_use integer,
>  bay_3_use integer,
>  bay_4_use integer,
>  bay_5_use integer,
>  date date NOT NULL,
>  inspection_id integer NOT NULL DEFAULT,
> )
> 
> How do i present an aggregate query
> 
> inspection_id  |  meter_id  |  bay_use
> 1 12345(value of bay_1_use)
> 1 12345(value of bay_2_use)
> 1 12345(value of bay_3_use)
> 2 23456(value of bay_1_use)
> 2 23456(value of bay_2_use)
> 2 23456(value of bay_3_use)
> 2 23456(value of bay_4_use)
> 2 23456(value of bay_5_use)

Firstly, the table definition you provided has an invalid specification
for the inspection_id column.  For my examples, I simply used a "NOT
NULL" constraint, and did not implement any DEFAULT clause.

If I understand your issue correctly, it seems like the denormalized
nature of your table is causing you some problems.  With the table
definition and output you provided, I am assuming the following contents
for the survey_load table:

 meter_id | number_of_bays | bay_1_use | bay_2_use | bay_3_use |
bay_4_use | bay_5_use |date| inspection_id
--++---+---+---+---+---++---
 12345|  3 |11 |12 |13 |
   |   | 2007-05-29 | 1
 23456|  5 |21 |22 |23 |
24 |25 | 2007-05-29 | 2
(2 rows)

The value of the number_of_bays column is driving the number of results
in your summary query, as opposed to allowing the data relations to
drive that on their own with a normalized representation.  You could
normalize your data representation into 2 tables:

tmp1=> \d+ new_survey_load
  Table "new_survey_load"
Column | Type | Modifiers | Description
---+--+---+-
 survey_id | integer  | not null  |
 meter_id  | character(5) | not null  |
 date  | date | not null  |
 inspection_id | integer  | not null  |
Indexes:
"new_survey_load_pkey" PRIMARY KEY, btree (survey_id)
Has OIDs: no

tmp1=> \d+ new_bay_use
Table "new_bay_use"
   Column   |  Type   | Modifiers | Description
+-+---+-
 survey_id  | integer | not null  |
 bay_number | integer | not null  |
 bay_use| integer | not null  |
Indexes:
"ind_new_bay_use__survey_id" btree (survey_id)
Foreign-key constraints:
"new_bay_use_survey_id_fkey" FOREIGN KEY (survey_id) REFERENCES
new_survey_load(survey_id)
Has OIDs: no

This normalization also makes it easier in the future to track
additional bays without having to alter the table structure.

Following your example from above, those tables would contain the
following values:

tmp1=> select * from new_survey_load ;
 survey_id | meter_id |date| inspection_id
---+--++---
13 | 12345| 2007-05-29 | 1
14 | 23456| 2007-05-29 | 2
(2 rows)

tmp1=> select * from new_bay_use ;
 survey_id | bay_number | bay_use
---++-
13 |  1 |  11
13 |  2 |  12
13 |  3 |  13
14 |  1 |  21
14 |  2 |  22
14 |  3 |  23
14 |  4 |  24
14 |  5 |  25
(8 rows)

The query to get the summary results you are looking for would then be:

select
l.inspection_id,
l.meter_id,
u.bay_use
from
new_survey_load l
join new_bay_use u on
l.survey_id = u.survey_id
order by
l.inspection_id, l.meter_id, u.bay_number;

You could also use a view to join the 2 new tables together to match the
format of your original table definition:

select
l.meter_id,
count(u.bay_number) as number_of_bays,
u1.bay_use as bay_1_use,
u2.bay_use as bay_2_use,
u3.bay_use as bay_3_use,
u4.bay_use as bay_4_use,
u5.bay_use as bay_5_use,
l.date,
l.inspection_id
from
new_survey_load l
join new_bay_use u on
l.survey_id = u.survey_id
left join new_bay_use u1 on
l.survey_id = u1.survey_id and u1.bay_number = 1
left join new_bay_use u2 on
l.survey_id = u2.survey_id and u2.bay_number = 2
left join new_bay_use u3 on
l.survey_id = u3.survey_id and u3.bay_number = 3
left join new_bay_use u4 on
l.survey_id = u4.survey_id and u4.bay_number = 4
left join new_bay_use u5 on
l.survey_id = u5.survey_id and u5.bay_number = 5
group by
1,3,4,5,6,7,8,9;

If you add more bays for tracking,

Re: [SQL] aggregate query

2007-05-29 Thread Gregory Stark
"Andrew Kroeger" <[EMAIL PROTECTED]> writes:

> Raj A wrote:
>> I have a table
>> 
>> CREATE TABLE survey_load
>> (
>>  meter_id character(5) NOT NULL,
>>  number_of_bays integer NOT NULL,
>>  bay_1_use integer,
>>  bay_2_use integer,
>>  bay_3_use integer,
>>  bay_4_use integer,
>>  bay_5_use integer,
>>  date date NOT NULL,
>>  inspection_id integer NOT NULL DEFAULT,
>> )
>> 
>> How do i present an aggregate query
>> 
>> inspection_id  |  meter_id  |  bay_use
>> 1 12345(value of bay_1_use)
>> 1 12345(value of bay_2_use)
>> 1 12345(value of bay_3_use)
>> 2 23456(value of bay_1_use)
>> 2 23456(value of bay_2_use)
>> 2 23456(value of bay_3_use)
>> 2 23456(value of bay_4_use)
>> 2 23456(value of bay_5_use)


>
> If I understand your issue correctly, it seems like the denormalized
> nature of your table is causing you some problems.  

True. Normalizing the tables would make this query easier which is a good sign
that that's probably the right direction.

If for some reason you can't or won't change the table definition there are a
number of possible tricky answers given the current definition. Something like
this for example:

SELECT inspection_id, meter_id, 
   case when bay=1 then bay_1_use 
when bay=2 then bay_2_use 
when bay=3 then bay_3_use 
when bay=4 then bay_4_use 
when bay=5 then bay_5_use 
else null 
end AS bay_use
  FROM (
SELECT *, generate_series(1,number_of_bays) AS bay
  FROM survey_load
   ) as x

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] aggregate query

2007-05-29 Thread Raj A

Thank you guys!
I'm currently migrating an Oracle database to postgres and have
created tables using the scripts that were readily available.  Glad I
can now improve this old system.

On 29/05/07, Gregory Stark <[EMAIL PROTECTED]> wrote:

"Andrew Kroeger" <[EMAIL PROTECTED]> writes:

> Raj A wrote:
>> I have a table
>>
>> CREATE TABLE survey_load
>> (
>>  meter_id character(5) NOT NULL,
>>  number_of_bays integer NOT NULL,
>>  bay_1_use integer,
>>  bay_2_use integer,
>>  bay_3_use integer,
>>  bay_4_use integer,
>>  bay_5_use integer,
>>  date date NOT NULL,
>>  inspection_id integer NOT NULL DEFAULT,
>> )
>>
>> How do i present an aggregate query
>>
>> inspection_id  |  meter_id  |  bay_use
>> 1 12345(value of bay_1_use)
>> 1 12345(value of bay_2_use)
>> 1 12345(value of bay_3_use)
>> 2 23456(value of bay_1_use)
>> 2 23456(value of bay_2_use)
>> 2 23456(value of bay_3_use)
>> 2 23456(value of bay_4_use)
>> 2 23456(value of bay_5_use)


>
> If I understand your issue correctly, it seems like the denormalized
> nature of your table is causing you some problems.

True. Normalizing the tables would make this query easier which is a good sign
that that's probably the right direction.

If for some reason you can't or won't change the table definition there are a
number of possible tricky answers given the current definition. Something like
this for example:

SELECT inspection_id, meter_id,
   case when bay=1 then bay_1_use
when bay=2 then bay_2_use
when bay=3 then bay_3_use
when bay=4 then bay_4_use
when bay=5 then bay_5_use
else null
end AS bay_use
  FROM (
SELECT *, generate_series(1,number_of_bays) AS bay
  FROM survey_load
   ) as x

--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[SQL] Recursively traversing a partially ordered set

2007-05-29 Thread Jason Grout

I have a performance related question (and a few code snippets to
contribute to the list archives).

I am trying to write a function that recursively walks a hierarchal
structure to find all ancestors or descendants of a particular item or
list of items.  My relationships are set up with the table:

CREATE TABLE relation (parent_id integer, child_id integer);
CREATE INDEX relation_parent ON relation(parent_id);
CREATE INDEX relation_child ON relation(child_id);

My data is *not* a tree (otherwise I'd probably use Celko's method).
Instead, my structure is a (graded) partially ordered set, which is more
general than a tree. In other words, each item has a level in the
hierarchy and can be connected to a number of nodes in the level above 
it and a number of nodes in the level below it. Items on the same level 
have no relation between them, but they generally have many common 
parents and many common children. In my case, items will generally have 
far more parents than children. Right now my relation table has several 
hundred thousand rows and 9 levels, but will eventually have tens of 
millions of rows and more levels. Note that all of these algorithms work 
just fine for trees too.


Given a list of item ids, I need to determine all items above them in
the hierarchy (or below them).  Since determining ancestors is the
inverse problem of determining descendants, I'll only treat the
ancestors problem in the rest of this email.  Note that the list I am
given may include items on different levels.

After playing with it for a few days, I ended up with four functions
(after trying lots of variations on these, based on things found in the
mailing list, documentation, and thinking about the problem).  The
functions are listed from slowest to fastest. My questions are at the
end of the email.

1. this rather nice, succinct, recursive (and slow) SQL function:

CREATE OR REPLACE FUNCTION get_ancestors_sql(integer[])
   RETURNS SETOF integer AS
$BODY$
SELECT DISTINCT parent_id from subgraphs
WHERE child_id = ANY ($1 ||
CASE
WHEN EXISTS(SELECT child_id FROM relation
WHERE child_id = ANY($1))
THEN ARRAY(SELECT * FROM
get_ancestors_sql(
ARRAY(SELECT parent_id FROM relation WHERE
child_id=ANY($1
ELSE '{}'::int[] END)
$BODY$
   LANGUAGE 'sql' STABLE;

2. this more verbose recursive (and still slow) pl/pgsql function:

CREATE OR REPLACE FUNCTION get_ancestors_pg(integer[])
   RETURNS SETOF integer AS
$BODY$
DECLARE
r record;
s record;
BEGIN
SELECT INTO s child_id FROM relation WHERE child_id=ANY($1) LIMIT 1;
IF FOUND THEN
FOR r IN SELECT DISTINCT parent_id from relation
WHERE child_id = ANY ($1) OR
child_id IN (SELECT * FROM get_ancestors_pg(
   ARRAY(SELECT DISTINCT parent_id FROM relation WHERE
child_id=ANY($1 LOOP
   RETURN NEXT r.parent_id;
END LOOP;
END IF;
END;
$BODY$
   LANGUAGE 'plpgsql' STABLE;


3. this faster function that uses a table to cache results.  This could
be modified so that consecutive calls use the cache to provide much
quicker responses, but currently I delete the information at the end of
a function call.

CREATE OR REPLACE FUNCTION get_ancestors_pg_temp(ids integer[])
   RETURNS SETOF integer AS
$BODY$
DECLARE
-- This is a "session" variable so that two concurrent calls
-- to the function don't step on each other and mutilate
-- each other's data.
randvar integer:=round(random()*200)::int;
i integer:=0;
r record;
BEGIN

INSERT INTO temp
SELECT DISTINCT g.parent_id, randvar from relation as g
WHERE g.child_id=ANY(ids);

LOOP
INSERT INTO temp
SELECT DISTINCT g.parent_id, randvar+1 from relation as g
WHERE g.child_id IN
(SELECT parent_id FROM temp WHERE session=randvar);
EXIT WHEN NOT FOUND;
i:=i+1;
randvar:=randvar+1;
END LOOP;

FOR r IN SELECT parent_id FROM temp
WHERE session BETWEEN randvar-i AND randvar LOOP
RETURN NEXT r.parent_id;
END LOOP;

DELETE FROM temp WHERE session BETWEEN randvar-i and randvar;

END;
$BODY$
   LANGUAGE 'plpgsql' VOLATILE;


4. This much, much faster version that loops and just throws everything
into an array and then spits out the array at the end.

CREATE OR REPLACE FUNCTION get_ancestors_pg_array(ids integer[])
   RETURNS integer[] AS
$BODY$
DECLARE
result int[]:='{}'::int[];
nextiter int[]:='{}'::int[];
temp int[]:=ids;
BEGIN

LOOP
SELECT INTO nextiter
ARRAY(SELECT DISTINCT g.parent_id from relation as g
WHERE g.child_id=ANY(temp));
result:=nextiter||result;
EXIT WHEN nextiter[1] IS NULL;
temp:=nextiter;
END LOOP;

RETURN result;

END;
$BODY$
   LANGUAGE 'plpgsql' STABLE;


The fourth function, get_ancestors_pg_array, seems to execute in about a
tenth the time that the other functions execute in.  My questions:

1. Is there a huge overhead for recursion?  In general, is it better to
stuff temporary results into a table (incurring the cost of an INSERT)
rather than recurse?

2. Is there a big difference in 

Re: [SQL] Recursively traversing a partially ordered set

2007-05-29 Thread Gregory Stark

"Jason Grout" <[EMAIL PROTECTED]> writes:

> 2. Is there a big difference in speed between using an array versus
> using a SELECT in a WHERE condition?  In other words, which is generally
> going to be faster:
>
> SELECT * from table where field IN (some function returning a SETOF);
>
> or
>
> SELECT * from table where field = ANY(some function returning an array);

In theory since these are equivalent (well nearly. To be equivalent the
optimizer would need to know whether the array could possibly have NULLs in
it) the optimizer ought to produce the same plan for each. In practice it's
not clear where the optimizer would get the information to decide what plan to
use for these two cases and whether it would have the same kinds of
information available.

In any case in practice the plans available in each of these cases are not the
same so you'll have to try them and see which one works better for you. I
think there are more plans available for the first case so it may work out
better if you're returning quite large sets where those plans help. If you're
returning quite small sets where you just need a simple bitmap index scan then
the second will be less likely to pick (or actually I think it's incapable of
picking) some other plan which works poorly.

There was some discussion recently on what to do about exactly this type of
case. I would be interested to hear about what plans you got from each and
which plan ended up being best.

> 3. Is there a strong reason I should strip out duplicates in either of
> the two cases in question 2?  Or is the performance about the same when
> doing the queries whether or not the SETOF or arrays contain duplicates?

The plans where it matters will remove the duplicates anyways, but I don't
think the array version does if you're not using an bitmap index scan.

> 4. Can you see any obvious optimizations to the above functions
> (particularly the last one)?
>
> Thanks for your help. Thanks for the absolutely wonderful database and
> solid documentation.  I originally did this project in MySQL and had the
> weirdest errors (the errors turned out to be due to the default
> case-insensitive collation of MySQL!).  That's when I decided to move to
> postgresql when I updated the project.

Well, unfortunately collocation support isn't exactly a strong point in
Postgres either. Plenty of people get bitten by their database being initdb'd
in a locale they didn't expect.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] problems with SELECT query results

2007-05-29 Thread Joshua

Hello,

I am new to this list and have been working with PostgreSQL since January.

Here is my problem, I hope someone here has some experience or can point 
me in the right direction. I am writing the following query for a C# 
program I am writing:


SELECT 'PV.LINEITEM:' || partnum || ',' || round(onhand) || ',' || 
round(qm5) || ',' || round(lsm4) || ',' || ',' || ',' || round(onorder) 
|| ',' || ',' || ',' || binone || ',' || ',' || round(backorderqty) || 
',' || ',' || round(onhold) || ',' || ',' || ',' || ',' || ',' || ',' || 
',' || round(qtyperjob) || ',' || round(ordermax) AS gmrim FROM slparts 
WHERE vendor LIKE 'CH%'


The query does work and I am getting results from the database. There 
are values for all 'partnum' in the database, however, the query results 
include blank fields here and there in between the returned records. Why 
am I receiving blank fields for 'gmrim' This absolutely defies logic 
and I cannot find any rhyme or reason for this problem. I cannot have 
any blank rows in the query, and again the database is completely 
populated with values.


Could someone give me an explanation or tell me why I am receiving blank 
rows here and there as a query result. If you need further clarification 
of this problem or have any questions for me to arrive at a conclusion 
please feel free to send them to me and I will be more than happy to 
open a dialog in order to solve this problem.


Thanks in advance for assistance.

Cordially,
Joshua Neil

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [SQL] problems with SELECT query results

2007-05-29 Thread Richard Huxton

Joshua wrote:

Hello,

I am new to this list and have been working with PostgreSQL since January.

Here is my problem, I hope someone here has some experience or can point 
me in the right direction. I am writing the following query for a C# 
program I am writing:


SELECT 'PV.LINEITEM:' || partnum || ',' || round(onhand) || ',' || 
round(qm5) || ',' || round(lsm4) || ',' || ',' || ',' || round(onorder) 
|| ',' || ',' || ',' || binone || ',' || ',' || round(backorderqty) || 
',' || ',' || round(onhold) || ',' || ',' || ',' || ',' || ',' || ',' || 
',' || round(qtyperjob) || ',' || round(ordermax) AS gmrim FROM slparts 
WHERE vendor LIKE 'CH%'


The query does work and I am getting results from the database. There 
are values for all 'partnum' in the database, however, the query results 
include blank fields here and there in between the returned records. Why 
am I receiving blank fields for 'gmrim' This absolutely defies logic 
and I cannot find any rhyme or reason for this problem. I cannot have 
any blank rows in the query, and again the database is completely 
populated with values.


I'm not sure it is - I think you've got a NULL somewhere.

Since NULL means "unknown" ('text' || NULL) = NULL

Wrap all your column-references in COALESCE: e.g. COALESCE(partnum,'') 
and see if that solves it. If so, go back and find rows WHERE partnum IS 
NULL and correct them. Then set the NOT NULL constraint on the relevant 
columns.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [SQL] problems with SELECT query results

2007-05-29 Thread Joshua

Hello,

Thank you all for your assistance. I did end up finding NULL in the 
'onorder' column which should have been zero's this was in a test 
table that happened to have some NULL in it for one reason or another 
but I should not find this in the production version of the table.


I also appreciate you pointing out the COALESCE function. I will find 
that helpful in future work.


I look forward to hopefully assisting you guys with some of your 
PostgreSQL dilemmas in the future!


Thanks again!!!

-Joshua

Richard Huxton wrote:

Joshua wrote:

Hello,

I am new to this list and have been working with PostgreSQL since 
January.


Here is my problem, I hope someone here has some experience or can 
point me in the right direction. I am writing the following query for 
a C# program I am writing:


SELECT 'PV.LINEITEM:' || partnum || ',' || round(onhand) || ',' || 
round(qm5) || ',' || round(lsm4) || ',' || ',' || ',' || 
round(onorder) || ',' || ',' || ',' || binone || ',' || ',' || 
round(backorderqty) || ',' || ',' || round(onhold) || ',' || ',' || 
',' || ',' || ',' || ',' || ',' || round(qtyperjob) || ',' || 
round(ordermax) AS gmrim FROM slparts WHERE vendor LIKE 'CH%'


The query does work and I am getting results from the database. There 
are values for all 'partnum' in the database, however, the query 
results include blank fields here and there in between the returned 
records. Why am I receiving blank fields for 'gmrim' This 
absolutely defies logic and I cannot find any rhyme or reason for 
this problem. I cannot have any blank rows in the query, and again 
the database is completely populated with values.


I'm not sure it is - I think you've got a NULL somewhere.

Since NULL means "unknown" ('text' || NULL) = NULL

Wrap all your column-references in COALESCE: e.g. COALESCE(partnum,'') 
and see if that solves it. If so, go back and find rows WHERE partnum 
IS NULL and correct them. Then set the NOT NULL constraint on the 
relevant columns.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [SQL] problems with SELECT query results

2007-05-29 Thread Richard Broersma Jr

--- Joshua <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> Thank you all for your assistance. I did end up finding NULL in the 
> 'onorder' column which should have been zero's this was in a test 
> table that happened to have some NULL in it for one reason or another 
> but I should not find this in the production version of the table.

If this should be allowed to have nulls, you should add a not null constraint 
to your table to
prevent this sort of thing from happening.  If a field should have zero's 
instead of nulls, you
should also add a default value of zero for this column.

ALTER TABLE Slparts
  ALTER COLUMN onorder SET NOT NULL,
  ALTER COLUMN onorder SET DEFAULT 0;

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster