Hi, I have a question about aggregate kernel implementation. Any help is appreciated.
Aggregate kernel implements "consume" and "merge" interfaces. For a chunked array, "consume" is called for each array to get a temporary aggregated result, then "merge" it with previously consumed result. For associative operations like min/max/sum, this pattern is convenient. We can easily "merge" min/max/sum of two arrays, e.g, sum([array_a, array_b]) = sum(array_a) + sum(array_b). But I wonder what's the best approach to deal with operations like stdev/percentile. Results of these operations cannot be easily "merged". We have to walk through all the chunks to get the result. For these operations, looks "consume" must copy the input array and do all calculation once at "finalize" time. Or we don't expect it to support chunked array for them. Yibo