[SQL] reduce many loosely related rows down to one

2013-05-25 Thread Bill MacArthur

Here is a boiled down example of a scenario which I am having a bit of 
difficulty solving.
This is a catchall table where all the rows are related to the "id" but are 
entered by different unrelated processes that do not necessarily have access to the other 
data bits.

CREATE TABLE test (
id INTEGER,
rspid INTEGER,
nspid INTEGER,
cid INTEGER,
iac BOOLEAN,
newp SMALLINT,
oldp SMALLINT,
ppv NUMERIC(7,2),
tppv NUMERIC(7,2)
);

INSERT INTO test (id, rspid, nspid, cid, iac) VALUES (1,2,3,4,TRUE);
INSERT INTO test (id, rspid, nspid, newp) VALUES (1,2,3,100);
INSERT INTO test (id, rspid, nspid, oldp) VALUES (1,2,3,200);
INSERT INTO test (id, rspid, nspid, tppv) VALUES (1,2,3,4100);
INSERT INTO test (id, rspid, nspid, tppv) VALUES (1,2,3,3100);
INSERT INTO test (id, rspid, nspid, ppv) VALUES (1,2,3,-100);
INSERT INTO test (id, rspid, nspid, ppv) VALUES (1,2,3,250);
INSERT INTO test (id, rspid, nspid, cid) VALUES (2,7,8,4);

-- raw data now looks like this:

select * from test;

 id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
+---+---+-+-+--+--+-+-
  1 | 2 | 3 |   4 | t   |  |  | |
  1 | 2 | 3 | | |  100 |  | |
  1 | 2 | 3 | | |  |  200 | |
  1 | 2 | 3 | | |  |  | | 4100.00
  1 | 2 | 3 | | |  |  | | 3100.00
  1 | 2 | 3 | | |  |  | -100.00 |
  1 | 2 | 3 | | |  |  |  250.00 |
  2 | 7 | 8 |   4 | |  |  | |
(8 rows)

-- I want this result (where ppv and tppv are summed and the other distinct 
values are boiled down into one row)
-- I want to avoid writing explicit UNIONs that will break if, say the "cid" was entered 
as a discreet row from the row containing "iac"
-- in this example "rspid" and "nspid" are always the same for a given ID, 
however they could possibly be absent for a given row as well

 id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
+---+---+-+-+--+--+-+-
  1 |2  | 3 |  4  | t   | 100  | 200  |  150.00  | 7200.00
  2 |7  | 8 |  4  | |  |  |0.00  |0.00


I have experimented with doing the aggregates as a CTE and then joining that to 
various incarnations of DISTINCT and DISTINCT ON, but those do not do what I 
want. Trying to find the right combination of terms to get an answer from 
Google has been unfruitful.

Any ideas?

Thank you for your consideration.
Bill MacArthur


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] reduce many loosely related rows down to one

2013-05-27 Thread Bill MacArthur


On 5/25/2013 7:57 AM, Marc Mamin wrote:




Von: [email protected] [[email protected]]" im 
Auftrag von "Bill MacArthur [[email protected]]
Gesendet: Samstag, 25. Mai 2013 09:19
An: [email protected]
Betreff: [SQL] reduce many loosely related rows down to one

Here is a boiled down example of a scenario which I am having a bit of 
difficulty solving.
This is a catchall table where all the rows are related to the "id" but are 
entered by different unrelated processes that do not necessarily have access to the other 
data bits.





-- raw data now looks like this:

select * from test;

   id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
+---+---+-+-+--+--+-+-
1 | 2 | 3 |   4 | t   |  |  | |
1 | 2 | 3 | | |  100 |  | |
1 | 2 | 3 | | |  |  200 | |
1 | 2 | 3 | | |  |  | | 4100.00
1 | 2 | 3 | | |  |  | | 3100.00
1 | 2 | 3 | | |  |  | -100.00 |
1 | 2 | 3 | | |  |  |  250.00 |
2 | 7 | 8 |   4 | |  |  | |
(8 rows)

-- I want this result (where ppv and tppv are summed and the other distinct 
values are boiled down into one row)
-- I want to avoid writing explicit UNIONs that will break if, say the "cid" was entered 
as a discreet row from the row containing "iac"
-- in this example "rspid" and "nspid" are always the same for a given ID, 
however they could possibly be absent for a given row as well

   id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
+---+---+-+-+--+--+-+-
1 |2  | 3 |  4  | t   | 100  | 200  |  150.00  | 7200.00
2 |7  | 8 |  4  | |  |  |0.00  |0.00


I have experimented with doing the aggregates as a CTE and then joining that to 
various incarnations of DISTINCT and DISTINCT ON, but those do not do what I 
want. Trying to find the right combination of terms to get an answer from 
Google has been unfruitful.



Hello,
If I understand you well, you want to perform a group by whereas null values 
are coalesced to existing not null values.
this seems to be logically not feasible.
What should look the result like if your "raw" data are as following:

   id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
+---+---+-+-+--+--+-+-
1 | 2 | 3 |   4 | t   |  |  | |
1 | 2 | 3 |   5 | t   |  |  | |
1 | 2 | 3 | | |  100 |  | |

(to which cid should newp be summed to?)

regards,

Marc Mmain


Ya, there is more to the picture than I described. Didn't want to bore with excessive 
detail. I was hoping that perhaps somebody would see the example and say "oh ya that 
can be solved with this obscure SQL implementation" :)
I have resigned myself to using a few more CTEs with DISTINCTs and joining it 
all up to get the results I want. Thanks for the look anyway Marc. Your 
description of what I wanted was more accurate and concise than I had words for 
at the time of the night I originally posted this.

Have a good one.

Bill MacArthur


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] reduce many loosely related rows down to one

2013-05-28 Thread Bill MacArthur

On 5/28/2013 11:04 AM, Torsten Grust wrote:

On 25 May 2013, at 9:19, Bill MacArthur wrote (with possible deletions):

[...]
select * from test;

id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
+---+---+-+-+--+--+-+-
1 | 2 | 3 |   4 | t   |  |  | |
1 | 2 | 3 | | |  100 |  | |
1 | 2 | 3 | | |  |  200 | |
1 | 2 | 3 | | |  |  | | 4100.00
1 | 2 | 3 | | |  |  | | 3100.00
1 | 2 | 3 | | |  |  | -100.00 |
1 | 2 | 3 | | |  |  |  250.00 |
2 | 7 | 8 |   4 | |  |  | |
(8 rows)

-- I want this result (where ppv and tppv are summed and the other distinct 
values are boiled down into one row)
-- I want to avoid writing explicit UNIONs that will break if, say the "cid" was entered 
as a discreet row from the row containing "iac"
-- in this example "rspid" and "nspid" are always the same for a given ID, 
however they could possibly be absent for a given row as well

id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
+---+---+-+-+--+--+-+-
1 |2  | 3 |  4  | t   | 100  | 200  |  150.00  | 7200.00
2 |7  | 8 |  4  | |  |  |0.00  |0.00


One possible option could be

SELECT id,
(array_agg(rspid))[1] AS rspid,-- (1)
(array_agg(nspid))[1] AS nspid,
(array_agg(cid))[1]   AS cid,
bool_or(iac)  AS iac,  -- (2)
max(newp) AS newp, -- (3)
min(oldp) AS oldp, -- (4)
coalesce(sum(ppv), 0) AS ppv,
coalesce(sum(tppv),0) AS tppv
FROM test
GROUP BY id;


This query computes the desired output for your example input.

There's a caveat here: your description of the problem has been
somewhat vague and it remains unclear how the query should
respond if the functional dependency id -> rspid
does not hold.  In this case, the array_agg(rspid)[1] in the line
marked (1) will pick one among many different(!) rspid values.
I don't know your scenario well enough to judge whether this would be
an acceptable behavior.  Other possible behaviors have been
implemented in the lines (2), (3), (4) where different aggregation
functions are used to reduce sets to a single value (e.g., pick the
largest/smallest of many values ...).

Cheers,
   --Torsten


Slick! Interesting usage scenarios for those aggregate functions array_agg and 
bool_or, one new to me and the other rarely used, and even for min and max 
which I never thought of using in this sense.

I tried not be be overbearing with descriptive details hoping that somebody 
would look at the simplistic case and offer what might be considered an obscure 
way of implementing some of Postgres's handy features for an unusual problem. 
With a little tweaking for the exact nature of the environment, I am good to go.

Thank you, Torsten!
Bill MacArthur


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql