Mark1626 opened a new pull request, #20460:
URL: https://github.com/apache/datafusion/pull/20460
## Which issue does this PR close?
Closes #5619
## What changes are included in this PR?
1. Introduce a separate accumulator for multi column distinct count
`MultiColumnDistinctCountAccumulator`
2. I used some parts of #5939 for reference, however it was old so I had to
reimplement this
## Are these changes tested?
1. Unit tests have been added
2. I've tested this with a couple of queries in the cli
```
with data AS (
select * from (values
('a', 1, 'x'),
('a', 2, 'x'),
('b', 2, 'y'),
('b', 2, 'z'),
('c', 3, 'z')
) AS t(col1, col2, col3)
)
select count(distinct (col1, col2)) FROM data;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]