[HACKERS] proposal - GROUPING SETS

Pavel Stehule Tue, 16 Sep 2008 03:30:03 -0700

Hello,

== Proposal - GROUPING SETS ==


a grouping set feature allows multiple grouping clauses in one
query. Result of grouping sets is union of results each groupby
clause's result.

create table t(a int, b int);
insert into t values(10,20);
insert into t values(30,40);

select a, b from t group by grouping sets(a, b);

is same as:

select a, NULL from t group by a
union all
select NULL, b from t group by b;

Note: all ungrouped vars are transformed to NULL

Groupby clause should contains cube and rollup lists.
These are transformed to grouping sets via transformed rules:

create table t1(a int, b int, c int)

group by rollup(a, b, c)
->
group by grouping sets((a,b,c), (a,b), (a), ())

group by cube(a,b,c)
->
group by grouping sets ((a,b,c),(a,b),(a,c), (a), (b,c),
                        (b), (c), ())

Groupby clause or grouping sets should contains more sets.
Result is multiplication of these sets:

group by grouping sets(a), grouping sets(b, (b,c), ())
->
group by grouping sets((a,b), (a,b,c), (a))

When grouping sets are used, then we should to use grouping
and grouping_id functions. Function grouping returns 1 when
parameter is in current group set, else returns 0. Function
grouping_id returns value as

grouping_id(a,b,c) =
to_dec(to_bin(grouping(a) || grouping(b) || grouping(c)))

postgres=# select * from t;
 a  | b  | c
----+----+----
 10 | 20 | 30
 10 | 20 | 30
(2 rows)

postgres=# select a,b,c from t group by grouping sets(a,b,c);
 a  | b  | c
----+----+----
 10 |    |
    | 20 |
    |    | 30
(3 rows)

postgres=# select a,b,c, grouping(a), grouping(b), grouping(c),
           grouping_id(a,b,c) from t group by grouping sets(a,b,c);
 a  | b  | c  | grouping | grouping | grouping | grouping_id
----+----+----+----------+----------+----------+-------------
 10 |    |    |        1 |        0 |        0 |           4
    | 20 |    |        0 |        1 |        0 |           2
    |    | 30 |        0 |        0 |        1 |           1
(3 rows)

some real sample:

create table report(
  inserted date,
  locality varchar,
  name varchar,
  c int);

postgres=# copy report to stdout;
2008-10-10      Prague  Milk    10
2008-10-11      Prague  Milk    12
2008-10-10      Prague  Rum     2
2008-10-11      Prague  Rum     6
2008-10-10      Berlin  Milk    8
2008-10-11      Berlin  Milk    14
2008-10-10      Berlin  Beer    20
2008-10-11      Berlin  Beer    25

postgres=# select * from report;
  inserted  | locality | name | c
------------+----------+------+----
 2008-10-10 | Prague   | Milk | 10
 2008-10-11 | Prague   | Milk | 12
 2008-10-10 | Prague   | Rum  |  2
 2008-10-11 | Prague   | Rum  |  6
 2008-10-10 | Berlin   | Milk |  8
 2008-10-11 | Berlin   | Milk | 14
 2008-10-10 | Berlin   | Beer | 20
 2008-10-11 | Berlin   | Beer | 25
(8 rows)

postgres=# select inserted, locality, name, sum(c)
              from report
             group by grouping sets(inserted, locality, name);
  inserted  | locality | name | sum
------------+----------+------+-----
 2008-10-10 |          |      |  40
 2008-10-11 |          |      |  57
            | Berlin   |      |  67
            | Prague   |      |  30
            |          | Milk |  44
            |          | Rum  |   8
            |          | Beer |  45
(7 rows)


postgres=# select inserted, locality, name, sum(c)
              from report
             group by grouping sets(inserted, (locality, name));
  inserted  | locality | name | sum
------------+----------+------+-----
 2008-10-10 |          |      |  40
 2008-10-11 |          |      |  57
            | Prague   | Milk |  22
            | Berlin   | Milk |  22
            | Berlin   | Beer |  45
            | Prague   | Rum  |   8
(6 rows)

postgres=# select inserted, locality, name, sum(c)
              from report
             group by name, grouping sets(inserted, locality);
  inserted  | locality | name | sum
------------+----------+------+-----
 2008-10-11 |          | Rum  |   6
 2008-10-10 |          | Rum  |   2
 2008-10-10 |          | Beer |  20
 2008-10-11 |          | Beer |  25
 2008-10-10 |          | Milk |  18
 2008-10-11 |          | Milk |  26
            | Prague   | Rum  |   8
            | Berlin   | Beer |  45
            | Berlin   | Milk |  22
            | Prague   | Milk |  22
(10 rows)

== Implementation ==
Grouping sets introduce a new concept into SQL. One readed
tuple is multiple used. It's similar with WITH clause. It's little
bit dificult implement it for current PostgreSQL's executor.

In my prototype I used aux node Feeder. This node should to hold
only one tuple. I add new method for Agg node, that process only
one input tuple:

        for (;;)
        {
                tuple = feeder->execute(grouping_sets->lefttree);
                foreach(l, grouping_sets->subplans)
                {
                        lfirst(l)->process(tuple);
                }
                if (is_null(tuple))
                                break;
        }

It supports only HASH Agg nodes. For non hash agg should be used
other method (currently it isn't supported).

postgres=# explain verbose select inserted, locality, name, sum(c)
                              from report
                             group by name,grouping sets(inserted, locality);
                           QUERY PLAN
-----------------------------------------------------------------
 Grouping Sets  (cost=12.00..35.00 rows=400 width=68)
   Output: NULL::date, locality, name, sum(c)
   ->  Seq Scan on report  (cost=0.00..18.00 rows=800 width=68)
         Output: inserted, locality, name, c
   ->  HashAggregate  (cost=6.00..8.50 rows=200 width=68)
         Output: inserted, NULL::character varying, name, sum(c)
         ->  Feeder  (cost=0.00..0.00 rows=800 width=68)
               Output: inserted, locality, name, c
   ->  HashAggregate  (cost=6.00..8.50 rows=200 width=68)
         Output: NULL::date, locality, name, sum(c)
         ->  Feeder  (cost=0.00..0.00 rows=800 width=68)
               Output: inserted, locality, name, c
(12 rows)

After work on prototype I don't see any problems in executor or
parser. I expect some dificulties in planner. With grouping sets
grouping_planner procedure will be much more complex:

1. targetlist and groupclause will be list of list,
2. we should to repeat estimation of NumGroups for each group
   set (it's should be shared with CTE feature??).

== Parser problems ==
1. The identifier cube is used in contrib cube.
   Solution: CUBE '(' ... ')' generates funcCall and it is transformed
   to grouping sets only in groupby clause later.

I invite any ideas, notes and help with documentation.

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal - GROUPING SETS

Reply via email to