> Aren't SELECT DISTINCT and COUNT DISTINCT just condensed variants of a GROUP 
> BY query? Do they need to be exposed as standalone kernels?

I listed SELECT DISTINCT and COUNT DISTINCT in the document only as
examples of SQL statements that take a variable number of arguments,
not to imply that these should be exposed as compute kernels in Arrow.
But I think you are right to suggest that they do not really belong in
this list, because as you say it is probably best to think of them as
shortcut SQL syntax for obtaining results that could instead be
obtained through a GROUP BY query. I have removed them.

Thank you,
Ian

On Fri, Jun 18, 2021 at 2:26 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Aren't SELECT DISTINCT and COUNT DISTINCT just condensed variants of a
> GROUP BY query? Do they need to be exposed as standalone kernels?
>
>
> Le 18/06/2021 à 00:58, Ian Cook a écrit :
> > Arrow developers,
> >
> > A couple of recent PRs have added new variadic scalar kernels to the
> > Arrow C++ library (ARROW-12751, ARROW-12709). There were some
> > questions raised in comments on Jira and GitHub about whether these
> > could instead be implemented as unary or binary kernels that take
> > ListArray or StructArray input. Since I believe we plan to add at
> > least a few more variadic kernels, I wrote a document [1] with help
> > from some colleagues at Ursa to describe the rationale behind why we
> > believe it is best to implement these as variadic kernels. Feedback is
> > welcome.
> >
> > Thank you,
> > Ian
> >
> > [1] 
> > https://docs.google.com/document/d/1ExysJ43WpjZ_P6vnfx6dzCRSvM-3qlqpc6gPjy9cNXM/
> >

Reply via email to