On 12/5/2018 8:30 PM, John W Higgins wrote:


On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott <spam_from_pgsql_li...@chezphil.org <mailto:spam_from_pgsql_li...@chezphil.org>> wrote:

    Dear Experts,

    I have a couple of tables that I want to reconcile, finding rows
    that match and places where rows are missing from one table or the
    other:

...

    So my question is: how can I modify my query to output only two rows,
    like this:?

    +------------+--------+------------+--------+
    |    date    | amount |    date    | amount |
    +------------+--------+------------+--------+
    | 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
    | 2018-02-01 |   5.00 |            |        |
    |            |        | 2018-03-01 |   8.00 |
    | 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
    | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
    | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
    +------------+--------+------------+--------+


Evening Phil,

Window functions are your friend here. I prefer views for this stuff - but subqueries would work just fine.

create view a_rows as (select *,
                       row_number() OVER (PARTITION BY date, amount) AS pos from a);
create view b_rows as (select *,
                       row_number() OVER (PARTITION BY date, amount) AS pos from b);

select
  a_rows.date,
  a_rows.amount,
  a_rows.pos,
  b_rows.date,
  b_rows.amount,
  b_rows.pos
from
  a_rows full join b_rows using (date,amount,pos);

Example here - http://sqlfiddle.com/#!17/305d6/3

John


    Any suggestions anyone?


    The best I have found so far is something involving EXCEPT ALL:

    db=> select * from a except all select * from b;
    db=> select * from b except all select * from a;

    That's not ideal, though, as what I ultimately want is something
    that lists everything with its status:

    +------------+--------+--------+
    |    date    | amount | status |
    +------------+--------+--------+
    | 2018-01-01 |  10.00 |   OK   |
    | 2018-02-01 |   5.00 | a_only |
    | 2018-03-01 |   8.00 | b_only |
    | 2018-04-01 |   5.00 |   OK   |
    | 2018-05-01 |  20.00 |   OK   |
    | 2018-05-01 |  20.00 |   OK   |
    +------------+--------+--------+

    That would be easy enough to achieve from the JOIN.


    Thanks, Phil.


This question is always asked time to time.
I have found an old article with so far the best solution for big tables.
https://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2151582681236#15393095283923

On the same test data
create table a (date date, amount money);
create table b (date date, amount money);

insert into a values ('2018-01-01', 10);
insert into a values ('2018-02-01', 5);
insert into a values ('2018-04-01', 5);
insert into a values ('2018-05-01', 20);
insert into a values ('2018-05-01', 20);
insert into b values ('2018-01-01', 10);
insert into b values ('2018-03-01', 8);
insert into b values ('2018-04-01', 5);
insert into b values ('2018-05-01', 20);
insert into b values ('2018-05-01', 20);

select tt.date,
       tt.amount,
       count(tt.src1) CNT1,
       count(tt.src2) CNT2
 from
(
 select a.date,
        a.amount,
        1 src1,
        null::integer src2
   from a
 union all
 select b.date,
        b.amount,
        null::integer src1,
        2 src2
  from b
  ) tt
group by tt.date, tt.amount;

date          amount    cnt1    cnt2
2018-01-01    $10.00     1       1
2018-02-01    $5.00      1       0
2018-03-01    $8.00      0       1
2018-04-01    $5.00      1       1
2018-05-01    $20.00     2       2

It requires a sort, so you may want to increase work_mem before execution, and then return it back like
SET work_mem = '512MB';
...  run your query
RESET work_mem;

Regards,

Sergei Agalakov

Reply via email to