We already have this concept. See SqlAggFunction.getDistinctOptionality(), added in https://issues.apache.org/jira/browse/CALCITE-3159 <https://issues.apache.org/jira/browse/CALCITE-3159>.
Julian > On Oct 13, 2020, at 12:54 AM, Fan Liya <[email protected]> wrote: > > Hi all, > > I would like to introduce the idea of duplicate insensitive aggregate > functions. > > For such functions, the aggregation results remain the same even after > deduplication. > > For example, given a sequence of data {1, 1, 2, 2, 3, 5, 5}, the > aggregation results of MIN are the same regardless of whether we perform > data deduplication first. That is, > > MIN({1, 1, 2, 2, 3, 5, 5}) = MIN({1, 2, 3, 5}) > > So MIN is a *deduplicate insensitive function*. > > On the other hand, function SUM is not duplicate insensitive, because > > SUM({1, 1, 2, 2, 3, 5, 5}) != SUM({1, 2, 3, 5}) > > The concept of deduplicate insensitiveness can help us in many optimization > scenarios. > > For example, the curent implementation of AggregateMergeRule rules out any > aggregate calls for which the isDistict() method returns true. However, for > duplicate insensitive functions, the rule should be applicable. > > Could you please give your valuable feedback? > > Best, > Liya Fan
