On 24/12/22 18:22, Ankit Pandey wrote:
Hi,
This is a PoC patch which implements distinct operation in window
aggregates (without order by and for single column aggregation, final
version may vary wrt these limitations). Purpose of this PoC is to get
feedback on the approach used and corresponding implementation, any
nitpicking as deemed reasonable.
Distinct operation is mirrored from implementation in nodeAgg.
Existing partitioning logic determines if row is in partition and when
distinct is required, all tuples for the aggregate column are stored
in tuplesort. When finalize_windowaggregate gets called, tuples are
sorted and duplicates are removed, followed by calling the transition
function on each tuple.
When distinct is not required, the above process is skipped and the
transition function gets called directly and nothing gets inserted
into tuplesort.
Note: For each partition, in tuplesort_begin and tuplesort_end is
involved to rinse tuplesort, so at any time, max tuples in tuplesort
is equal to tuples in a particular partition.
I have verified it for interger and interval column aggregates (to
rule out obvious issues related to data types).
Sample cases:
create table mytable(id int, name text);
insert into mytable values(1, 'A');
insert into mytable values(1, 'A');
insert into mytable values(5, 'B');
insert into mytable values(3, 'A');
insert into mytable values(1, 'A');
select avg(distinct id) over (partition by name) from mytable;
avg
--------------------
2.0000000000000000
2.0000000000000000
2.0000000000000000
2.0000000000000000
5.0000000000000000
select avg(id) over (partition by name) from mytable;
avg
--------------------
1.5000000000000000
1.5000000000000000
1.5000000000000000
1.5000000000000000
5.0000000000000000
select avg(distinct id) over () from mytable;
avg
--------------------
3.0000000000000000
3.0000000000000000
3.0000000000000000
3.0000000000000000
3.0000000000000000
select avg(distinct id) from mytable;
avg
--------------------
3.0000000000000000
This is my first-time contribution. Please let me know if anything can be
improved as I`m eager to learn.
Regards,
Ankit Kumar Pandey
Hi all,
I know everyone is busy with holidays (well, Happy Holidays!) but I will
be glad if someone can take a quick look at this PoC and share thoughts.
This is my first time contribution so I am pretty sure there will be
some very obvious feedbacks (which will help me to move forward with
this change).
--
Regards,
Ankit Kumar Pandey