Hi Julian, Thanks again for your feedback.
Since they are duplicate-insensitive, they should also be splittable (SqlSplittableAggFunction), just like min/max, etc. What do you think? I want to fire a JIRA accordingly, so that more optimizations can be applied. Any feedback is appreciated. Best, Liya Fan On Wed, Oct 14, 2020 at 2:59 AM Julian Hyde <[email protected]> wrote: > I agree. ANY_VALUE and SINGLE_VALUE are duplicate-insensitive. > > > On Oct 13, 2020, at 2:17 AM, Fan Liya <[email protected]> wrote: > > > > Hi Julian, > > > > Thanks a lot for your feedback. > > I think SqlAggFunction.getDistinctOptionality() is exactly what I > > am looking for. > > > > BTW, I think ANY_VALUE and SINGLE_VALUE also belong to the category of > > duplicate insensitive functions. > > What do you think? > > > > Best, > > Liya Fan > > > > > > > > On Tue, Oct 13, 2020 at 4:55 PM Julian Hyde <[email protected]> > wrote: > > > >> We already have this concept. See > SqlAggFunction.getDistinctOptionality(), > >> added in https://issues.apache.org/jira/browse/CALCITE-3159 < > >> https://issues.apache.org/jira/browse/CALCITE-3159>. > >> > >> Julian > >> > >> > >>> On Oct 13, 2020, at 12:54 AM, Fan Liya <[email protected]> wrote: > >>> > >>> Hi all, > >>> > >>> I would like to introduce the idea of duplicate insensitive aggregate > >>> functions. > >>> > >>> For such functions, the aggregation results remain the same even after > >>> deduplication. > >>> > >>> For example, given a sequence of data {1, 1, 2, 2, 3, 5, 5}, the > >>> aggregation results of MIN are the same regardless of whether we > perform > >>> data deduplication first. That is, > >>> > >>> MIN({1, 1, 2, 2, 3, 5, 5}) = MIN({1, 2, 3, 5}) > >>> > >>> So MIN is a *deduplicate insensitive function*. > >>> > >>> On the other hand, function SUM is not duplicate insensitive, because > >>> > >>> SUM({1, 1, 2, 2, 3, 5, 5}) != SUM({1, 2, 3, 5}) > >>> > >>> The concept of deduplicate insensitiveness can help us in many > >> optimization > >>> scenarios. > >>> > >>> For example, the curent implementation of AggregateMergeRule rules out > >> any > >>> aggregate calls for which the isDistict() method returns true. However, > >> for > >>> duplicate insensitive functions, the rule should be applicable. > >>> > >>> Could you please give your valuable feedback? > >>> > >>> Best, > >>> Liya Fan > >> > >> > >
