Re: [C++] Apache Arrow C++ Variadic Kernels Design

2021-06-18 Thread Wes McKinney
COUNT(DISTINCT varargs...) can be used either as a scalar aggregate function or a group aggregate function. For example SELECT COUNT(DISTINCT expr1, expr2, ...) FROM TABLE; returns a single value. It can be used with GROUP BY to produce a distinct count per group. I think it would be useful to ha

Re: [C++] Apache Arrow C++ Variadic Kernels Design

2021-06-18 Thread Ian Cook
> Aren't SELECT DISTINCT and COUNT DISTINCT just condensed variants of a GROUP > BY query? Do they need to be exposed as standalone kernels? I listed SELECT DISTINCT and COUNT DISTINCT in the document only as examples of SQL statements that take a variable number of arguments, not to imply that t

Re: [C++] Apache Arrow C++ Variadic Kernels Design

2021-06-18 Thread Antoine Pitrou
Aren't SELECT DISTINCT and COUNT DISTINCT just condensed variants of a GROUP BY query? Do they need to be exposed as standalone kernels? Le 18/06/2021 à 00:58, Ian Cook a écrit : Arrow developers, A couple of recent PRs have added new variadic scalar kernels to the Arrow C++ library (ARROW

Re: [C++] Apache Arrow C++ Variadic Kernels Design

2021-06-18 Thread Wes McKinney
hi Ian — I agree with implementing these functions with varargs/variadic inputs (this was my original intent when drafting compute/kernel.h and related machinery last year). As one nuance with the way that things work right now, the type matching infrastructure isn't necessarily able to determine

[C++] Apache Arrow C++ Variadic Kernels Design

2021-06-17 Thread Ian Cook
Arrow developers, A couple of recent PRs have added new variadic scalar kernels to the Arrow C++ library (ARROW-12751, ARROW-12709). There were some questions raised in comments on Jira and GitHub about whether these could instead be implemented as unary or binary kernels that take ListArray or St