I agree. ANY_VALUE and SINGLE_VALUE are duplicate-insensitive.

> On Oct 13, 2020, at 2:17 AM, Fan Liya <[email protected]> wrote:
> 
> Hi Julian,
> 
> Thanks a lot for your feedback.
> I think SqlAggFunction.getDistinctOptionality() is exactly what I
> am looking for.
> 
> BTW, I think ANY_VALUE and SINGLE_VALUE also belong to the category of
> duplicate insensitive functions.
> What do you think?
> 
> Best,
> Liya Fan
> 
> 
> 
> On Tue, Oct 13, 2020 at 4:55 PM Julian Hyde <[email protected]> wrote:
> 
>> We already have this concept. See SqlAggFunction.getDistinctOptionality(),
>> added in https://issues.apache.org/jira/browse/CALCITE-3159 <
>> https://issues.apache.org/jira/browse/CALCITE-3159>.
>> 
>> Julian
>> 
>> 
>>> On Oct 13, 2020, at 12:54 AM, Fan Liya <[email protected]> wrote:
>>> 
>>> Hi all,
>>> 
>>> I would like to introduce the idea of duplicate insensitive aggregate
>>> functions.
>>> 
>>> For such functions, the aggregation results remain the same even after
>>> deduplication.
>>> 
>>> For example, given a sequence of data {1, 1, 2, 2, 3, 5, 5}, the
>>> aggregation results of MIN are the same regardless of whether we perform
>>> data deduplication first. That is,
>>> 
>>> MIN({1, 1, 2, 2, 3, 5, 5}) = MIN({1, 2, 3, 5})
>>> 
>>> So MIN is a *deduplicate insensitive function*.
>>> 
>>> On the other hand, function SUM is not duplicate insensitive, because
>>> 
>>> SUM({1, 1, 2, 2, 3, 5, 5}) != SUM({1, 2, 3, 5})
>>> 
>>> The concept of deduplicate insensitiveness can help us in many
>> optimization
>>> scenarios.
>>> 
>>> For example, the curent implementation of AggregateMergeRule rules out
>> any
>>> aggregate calls for which the isDistict() method returns true. However,
>> for
>>> duplicate insensitive functions, the rule should be applicable.
>>> 
>>> Could you please give your valuable feedback?
>>> 
>>> Best,
>>> Liya Fan
>> 
>> 

Reply via email to