We already have this concept. See SqlAggFunction.getDistinctOptionality(), 
added in https://issues.apache.org/jira/browse/CALCITE-3159 
<https://issues.apache.org/jira/browse/CALCITE-3159>.

Julian


> On Oct 13, 2020, at 12:54 AM, Fan Liya <[email protected]> wrote:
> 
> Hi all,
> 
> I would like to introduce the idea of duplicate insensitive aggregate
> functions.
> 
> For such functions, the aggregation results remain the same even after
> deduplication.
> 
> For example, given a sequence of data {1, 1, 2, 2, 3, 5, 5}, the
> aggregation results of MIN are the same regardless of whether we perform
> data deduplication first. That is,
> 
> MIN({1, 1, 2, 2, 3, 5, 5}) = MIN({1, 2, 3, 5})
> 
> So MIN is a *deduplicate insensitive function*.
> 
> On the other hand, function SUM is not duplicate insensitive, because
> 
> SUM({1, 1, 2, 2, 3, 5, 5}) != SUM({1, 2, 3, 5})
> 
> The concept of deduplicate insensitiveness can help us in many optimization
> scenarios.
> 
> For example, the curent implementation of AggregateMergeRule rules out any
> aggregate calls for which the isDistict() method returns true. However, for
> duplicate insensitive functions, the rule should be applicable.
> 
> Could you please give your valuable feedback?
> 
> Best,
> Liya Fan

Reply via email to