Re: [SQL] aggregate query
Raj A wrote: > I have a table > > CREATE TABLE survey_load > ( > meter_id character(5) NOT NULL, > number_of_bays integer NOT NULL, > bay_1_use integer, > bay_2_use integer, > bay_3_use integer, > bay_4_use integer, > bay_5_use integer, > date date NOT NULL, > inspection_id integer NOT NULL DEFAULT, > ) > > How do i present an aggregate query > > inspection_id | meter_id | bay_use > 1 12345(value of bay_1_use) > 1 12345(value of bay_2_use) > 1 12345(value of bay_3_use) > 2 23456(value of bay_1_use) > 2 23456(value of bay_2_use) > 2 23456(value of bay_3_use) > 2 23456(value of bay_4_use) > 2 23456(value of bay_5_use) Firstly, the table definition you provided has an invalid specification for the inspection_id column. For my examples, I simply used a "NOT NULL" constraint, and did not implement any DEFAULT clause. If I understand your issue correctly, it seems like the denormalized nature of your table is causing you some problems. With the table definition and output you provided, I am assuming the following contents for the survey_load table: meter_id | number_of_bays | bay_1_use | bay_2_use | bay_3_use | bay_4_use | bay_5_use |date| inspection_id --++---+---+---+---+---++--- 12345| 3 |11 |12 |13 | | | 2007-05-29 | 1 23456| 5 |21 |22 |23 | 24 |25 | 2007-05-29 | 2 (2 rows) The value of the number_of_bays column is driving the number of results in your summary query, as opposed to allowing the data relations to drive that on their own with a normalized representation. You could normalize your data representation into 2 tables: tmp1=> \d+ new_survey_load Table "new_survey_load" Column | Type | Modifiers | Description ---+--+---+- survey_id | integer | not null | meter_id | character(5) | not null | date | date | not null | inspection_id | integer | not null | Indexes: "new_survey_load_pkey" PRIMARY KEY, btree (survey_id) Has OIDs: no tmp1=> \d+ new_bay_use Table "new_bay_use" Column | Type | Modifiers | Description +-+---+- survey_id | integer | not null | bay_number | integer | not null | bay_use| integer | not null | Indexes: "ind_new_bay_use__survey_id" btree (survey_id) Foreign-key constraints: "new_bay_use_survey_id_fkey" FOREIGN KEY (survey_id) REFERENCES new_survey_load(survey_id) Has OIDs: no This normalization also makes it easier in the future to track additional bays without having to alter the table structure. Following your example from above, those tables would contain the following values: tmp1=> select * from new_survey_load ; survey_id | meter_id |date| inspection_id ---+--++--- 13 | 12345| 2007-05-29 | 1 14 | 23456| 2007-05-29 | 2 (2 rows) tmp1=> select * from new_bay_use ; survey_id | bay_number | bay_use ---++- 13 | 1 | 11 13 | 2 | 12 13 | 3 | 13 14 | 1 | 21 14 | 2 | 22 14 | 3 | 23 14 | 4 | 24 14 | 5 | 25 (8 rows) The query to get the summary results you are looking for would then be: select l.inspection_id, l.meter_id, u.bay_use from new_survey_load l join new_bay_use u on l.survey_id = u.survey_id order by l.inspection_id, l.meter_id, u.bay_number; You could also use a view to join the 2 new tables together to match the format of your original table definition: select l.meter_id, count(u.bay_number) as number_of_bays, u1.bay_use as bay_1_use, u2.bay_use as bay_2_use, u3.bay_use as bay_3_use, u4.bay_use as bay_4_use, u5.bay_use as bay_5_use, l.date, l.inspection_id from new_survey_load l join new_bay_use u on l.survey_id = u.survey_id left join new_bay_use u1 on l.survey_id = u1.survey_id and u1.bay_number = 1 left join new_bay_use u2 on l.survey_id = u2.survey_id and u2.bay_number = 2 left join new_bay_use u3 on l.survey_id = u3.survey_id and u3.bay_number = 3 left join new_bay_use u4 on l.survey_id = u4.survey_id and u4.bay_number = 4 left join new_bay_use u5 on l.survey_id = u5.survey_id and u5.bay_number = 5 group by 1,3,4,5,6,7,8,9; If you add more bays for tracking,
Re: [SQL] aggregate query
"Andrew Kroeger" <[EMAIL PROTECTED]> writes: > Raj A wrote: >> I have a table >> >> CREATE TABLE survey_load >> ( >> meter_id character(5) NOT NULL, >> number_of_bays integer NOT NULL, >> bay_1_use integer, >> bay_2_use integer, >> bay_3_use integer, >> bay_4_use integer, >> bay_5_use integer, >> date date NOT NULL, >> inspection_id integer NOT NULL DEFAULT, >> ) >> >> How do i present an aggregate query >> >> inspection_id | meter_id | bay_use >> 1 12345(value of bay_1_use) >> 1 12345(value of bay_2_use) >> 1 12345(value of bay_3_use) >> 2 23456(value of bay_1_use) >> 2 23456(value of bay_2_use) >> 2 23456(value of bay_3_use) >> 2 23456(value of bay_4_use) >> 2 23456(value of bay_5_use) > > If I understand your issue correctly, it seems like the denormalized > nature of your table is causing you some problems. True. Normalizing the tables would make this query easier which is a good sign that that's probably the right direction. If for some reason you can't or won't change the table definition there are a number of possible tricky answers given the current definition. Something like this for example: SELECT inspection_id, meter_id, case when bay=1 then bay_1_use when bay=2 then bay_2_use when bay=3 then bay_3_use when bay=4 then bay_4_use when bay=5 then bay_5_use else null end AS bay_use FROM ( SELECT *, generate_series(1,number_of_bays) AS bay FROM survey_load ) as x -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] aggregate query
Thank you guys! I'm currently migrating an Oracle database to postgres and have created tables using the scripts that were readily available. Glad I can now improve this old system. On 29/05/07, Gregory Stark <[EMAIL PROTECTED]> wrote: "Andrew Kroeger" <[EMAIL PROTECTED]> writes: > Raj A wrote: >> I have a table >> >> CREATE TABLE survey_load >> ( >> meter_id character(5) NOT NULL, >> number_of_bays integer NOT NULL, >> bay_1_use integer, >> bay_2_use integer, >> bay_3_use integer, >> bay_4_use integer, >> bay_5_use integer, >> date date NOT NULL, >> inspection_id integer NOT NULL DEFAULT, >> ) >> >> How do i present an aggregate query >> >> inspection_id | meter_id | bay_use >> 1 12345(value of bay_1_use) >> 1 12345(value of bay_2_use) >> 1 12345(value of bay_3_use) >> 2 23456(value of bay_1_use) >> 2 23456(value of bay_2_use) >> 2 23456(value of bay_3_use) >> 2 23456(value of bay_4_use) >> 2 23456(value of bay_5_use) > > If I understand your issue correctly, it seems like the denormalized > nature of your table is causing you some problems. True. Normalizing the tables would make this query easier which is a good sign that that's probably the right direction. If for some reason you can't or won't change the table definition there are a number of possible tricky answers given the current definition. Something like this for example: SELECT inspection_id, meter_id, case when bay=1 then bay_1_use when bay=2 then bay_2_use when bay=3 then bay_3_use when bay=4 then bay_4_use when bay=5 then bay_5_use else null end AS bay_use FROM ( SELECT *, generate_series(1,number_of_bays) AS bay FROM survey_load ) as x -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[SQL] Recursively traversing a partially ordered set
I have a performance related question (and a few code snippets to
contribute to the list archives).
I am trying to write a function that recursively walks a hierarchal
structure to find all ancestors or descendants of a particular item or
list of items. My relationships are set up with the table:
CREATE TABLE relation (parent_id integer, child_id integer);
CREATE INDEX relation_parent ON relation(parent_id);
CREATE INDEX relation_child ON relation(child_id);
My data is *not* a tree (otherwise I'd probably use Celko's method).
Instead, my structure is a (graded) partially ordered set, which is more
general than a tree. In other words, each item has a level in the
hierarchy and can be connected to a number of nodes in the level above
it and a number of nodes in the level below it. Items on the same level
have no relation between them, but they generally have many common
parents and many common children. In my case, items will generally have
far more parents than children. Right now my relation table has several
hundred thousand rows and 9 levels, but will eventually have tens of
millions of rows and more levels. Note that all of these algorithms work
just fine for trees too.
Given a list of item ids, I need to determine all items above them in
the hierarchy (or below them). Since determining ancestors is the
inverse problem of determining descendants, I'll only treat the
ancestors problem in the rest of this email. Note that the list I am
given may include items on different levels.
After playing with it for a few days, I ended up with four functions
(after trying lots of variations on these, based on things found in the
mailing list, documentation, and thinking about the problem). The
functions are listed from slowest to fastest. My questions are at the
end of the email.
1. this rather nice, succinct, recursive (and slow) SQL function:
CREATE OR REPLACE FUNCTION get_ancestors_sql(integer[])
RETURNS SETOF integer AS
$BODY$
SELECT DISTINCT parent_id from subgraphs
WHERE child_id = ANY ($1 ||
CASE
WHEN EXISTS(SELECT child_id FROM relation
WHERE child_id = ANY($1))
THEN ARRAY(SELECT * FROM
get_ancestors_sql(
ARRAY(SELECT parent_id FROM relation WHERE
child_id=ANY($1
ELSE '{}'::int[] END)
$BODY$
LANGUAGE 'sql' STABLE;
2. this more verbose recursive (and still slow) pl/pgsql function:
CREATE OR REPLACE FUNCTION get_ancestors_pg(integer[])
RETURNS SETOF integer AS
$BODY$
DECLARE
r record;
s record;
BEGIN
SELECT INTO s child_id FROM relation WHERE child_id=ANY($1) LIMIT 1;
IF FOUND THEN
FOR r IN SELECT DISTINCT parent_id from relation
WHERE child_id = ANY ($1) OR
child_id IN (SELECT * FROM get_ancestors_pg(
ARRAY(SELECT DISTINCT parent_id FROM relation WHERE
child_id=ANY($1 LOOP
RETURN NEXT r.parent_id;
END LOOP;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' STABLE;
3. this faster function that uses a table to cache results. This could
be modified so that consecutive calls use the cache to provide much
quicker responses, but currently I delete the information at the end of
a function call.
CREATE OR REPLACE FUNCTION get_ancestors_pg_temp(ids integer[])
RETURNS SETOF integer AS
$BODY$
DECLARE
-- This is a "session" variable so that two concurrent calls
-- to the function don't step on each other and mutilate
-- each other's data.
randvar integer:=round(random()*200)::int;
i integer:=0;
r record;
BEGIN
INSERT INTO temp
SELECT DISTINCT g.parent_id, randvar from relation as g
WHERE g.child_id=ANY(ids);
LOOP
INSERT INTO temp
SELECT DISTINCT g.parent_id, randvar+1 from relation as g
WHERE g.child_id IN
(SELECT parent_id FROM temp WHERE session=randvar);
EXIT WHEN NOT FOUND;
i:=i+1;
randvar:=randvar+1;
END LOOP;
FOR r IN SELECT parent_id FROM temp
WHERE session BETWEEN randvar-i AND randvar LOOP
RETURN NEXT r.parent_id;
END LOOP;
DELETE FROM temp WHERE session BETWEEN randvar-i and randvar;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
4. This much, much faster version that loops and just throws everything
into an array and then spits out the array at the end.
CREATE OR REPLACE FUNCTION get_ancestors_pg_array(ids integer[])
RETURNS integer[] AS
$BODY$
DECLARE
result int[]:='{}'::int[];
nextiter int[]:='{}'::int[];
temp int[]:=ids;
BEGIN
LOOP
SELECT INTO nextiter
ARRAY(SELECT DISTINCT g.parent_id from relation as g
WHERE g.child_id=ANY(temp));
result:=nextiter||result;
EXIT WHEN nextiter[1] IS NULL;
temp:=nextiter;
END LOOP;
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' STABLE;
The fourth function, get_ancestors_pg_array, seems to execute in about a
tenth the time that the other functions execute in. My questions:
1. Is there a huge overhead for recursion? In general, is it better to
stuff temporary results into a table (incurring the cost of an INSERT)
rather than recurse?
2. Is there a big difference in
Re: [SQL] Recursively traversing a partially ordered set
"Jason Grout" <[EMAIL PROTECTED]> writes: > 2. Is there a big difference in speed between using an array versus > using a SELECT in a WHERE condition? In other words, which is generally > going to be faster: > > SELECT * from table where field IN (some function returning a SETOF); > > or > > SELECT * from table where field = ANY(some function returning an array); In theory since these are equivalent (well nearly. To be equivalent the optimizer would need to know whether the array could possibly have NULLs in it) the optimizer ought to produce the same plan for each. In practice it's not clear where the optimizer would get the information to decide what plan to use for these two cases and whether it would have the same kinds of information available. In any case in practice the plans available in each of these cases are not the same so you'll have to try them and see which one works better for you. I think there are more plans available for the first case so it may work out better if you're returning quite large sets where those plans help. If you're returning quite small sets where you just need a simple bitmap index scan then the second will be less likely to pick (or actually I think it's incapable of picking) some other plan which works poorly. There was some discussion recently on what to do about exactly this type of case. I would be interested to hear about what plans you got from each and which plan ended up being best. > 3. Is there a strong reason I should strip out duplicates in either of > the two cases in question 2? Or is the performance about the same when > doing the queries whether or not the SETOF or arrays contain duplicates? The plans where it matters will remove the duplicates anyways, but I don't think the array version does if you're not using an bitmap index scan. > 4. Can you see any obvious optimizations to the above functions > (particularly the last one)? > > Thanks for your help. Thanks for the absolutely wonderful database and > solid documentation. I originally did this project in MySQL and had the > weirdest errors (the errors turned out to be due to the default > case-insensitive collation of MySQL!). That's when I decided to move to > postgresql when I updated the project. Well, unfortunately collocation support isn't exactly a strong point in Postgres either. Plenty of people get bitten by their database being initdb'd in a locale they didn't expect. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[SQL] problems with SELECT query results
Hello, I am new to this list and have been working with PostgreSQL since January. Here is my problem, I hope someone here has some experience or can point me in the right direction. I am writing the following query for a C# program I am writing: SELECT 'PV.LINEITEM:' || partnum || ',' || round(onhand) || ',' || round(qm5) || ',' || round(lsm4) || ',' || ',' || ',' || round(onorder) || ',' || ',' || ',' || binone || ',' || ',' || round(backorderqty) || ',' || ',' || round(onhold) || ',' || ',' || ',' || ',' || ',' || ',' || ',' || round(qtyperjob) || ',' || round(ordermax) AS gmrim FROM slparts WHERE vendor LIKE 'CH%' The query does work and I am getting results from the database. There are values for all 'partnum' in the database, however, the query results include blank fields here and there in between the returned records. Why am I receiving blank fields for 'gmrim' This absolutely defies logic and I cannot find any rhyme or reason for this problem. I cannot have any blank rows in the query, and again the database is completely populated with values. Could someone give me an explanation or tell me why I am receiving blank rows here and there as a query result. If you need further clarification of this problem or have any questions for me to arrive at a conclusion please feel free to send them to me and I will be more than happy to open a dialog in order to solve this problem. Thanks in advance for assistance. Cordially, Joshua Neil ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [SQL] problems with SELECT query results
Joshua wrote:
Hello,
I am new to this list and have been working with PostgreSQL since January.
Here is my problem, I hope someone here has some experience or can point
me in the right direction. I am writing the following query for a C#
program I am writing:
SELECT 'PV.LINEITEM:' || partnum || ',' || round(onhand) || ',' ||
round(qm5) || ',' || round(lsm4) || ',' || ',' || ',' || round(onorder)
|| ',' || ',' || ',' || binone || ',' || ',' || round(backorderqty) ||
',' || ',' || round(onhold) || ',' || ',' || ',' || ',' || ',' || ',' ||
',' || round(qtyperjob) || ',' || round(ordermax) AS gmrim FROM slparts
WHERE vendor LIKE 'CH%'
The query does work and I am getting results from the database. There
are values for all 'partnum' in the database, however, the query results
include blank fields here and there in between the returned records. Why
am I receiving blank fields for 'gmrim' This absolutely defies logic
and I cannot find any rhyme or reason for this problem. I cannot have
any blank rows in the query, and again the database is completely
populated with values.
I'm not sure it is - I think you've got a NULL somewhere.
Since NULL means "unknown" ('text' || NULL) = NULL
Wrap all your column-references in COALESCE: e.g. COALESCE(partnum,'')
and see if that solves it. If so, go back and find rows WHERE partnum IS
NULL and correct them. Then set the NOT NULL constraint on the relevant
columns.
--
Richard Huxton
Archonet Ltd
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Re: [SQL] problems with SELECT query results
Hello,
Thank you all for your assistance. I did end up finding NULL in the
'onorder' column which should have been zero's this was in a test
table that happened to have some NULL in it for one reason or another
but I should not find this in the production version of the table.
I also appreciate you pointing out the COALESCE function. I will find
that helpful in future work.
I look forward to hopefully assisting you guys with some of your
PostgreSQL dilemmas in the future!
Thanks again!!!
-Joshua
Richard Huxton wrote:
Joshua wrote:
Hello,
I am new to this list and have been working with PostgreSQL since
January.
Here is my problem, I hope someone here has some experience or can
point me in the right direction. I am writing the following query for
a C# program I am writing:
SELECT 'PV.LINEITEM:' || partnum || ',' || round(onhand) || ',' ||
round(qm5) || ',' || round(lsm4) || ',' || ',' || ',' ||
round(onorder) || ',' || ',' || ',' || binone || ',' || ',' ||
round(backorderqty) || ',' || ',' || round(onhold) || ',' || ',' ||
',' || ',' || ',' || ',' || ',' || round(qtyperjob) || ',' ||
round(ordermax) AS gmrim FROM slparts WHERE vendor LIKE 'CH%'
The query does work and I am getting results from the database. There
are values for all 'partnum' in the database, however, the query
results include blank fields here and there in between the returned
records. Why am I receiving blank fields for 'gmrim' This
absolutely defies logic and I cannot find any rhyme or reason for
this problem. I cannot have any blank rows in the query, and again
the database is completely populated with values.
I'm not sure it is - I think you've got a NULL somewhere.
Since NULL means "unknown" ('text' || NULL) = NULL
Wrap all your column-references in COALESCE: e.g. COALESCE(partnum,'')
and see if that solves it. If so, go back and find rows WHERE partnum
IS NULL and correct them. Then set the NOT NULL constraint on the
relevant columns.
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Re: [SQL] problems with SELECT query results
--- Joshua <[EMAIL PROTECTED]> wrote: > Hello, > > Thank you all for your assistance. I did end up finding NULL in the > 'onorder' column which should have been zero's this was in a test > table that happened to have some NULL in it for one reason or another > but I should not find this in the production version of the table. If this should be allowed to have nulls, you should add a not null constraint to your table to prevent this sort of thing from happening. If a field should have zero's instead of nulls, you should also add a default value of zero for this column. ALTER TABLE Slparts ALTER COLUMN onorder SET NOT NULL, ALTER COLUMN onorder SET DEFAULT 0; Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
