I agree. ANY_VALUE and SINGLE_VALUE are duplicate-insensitive.
> On Oct 13, 2020, at 2:17 AM, Fan Liya <[email protected]> wrote:
>
> Hi Julian,
>
> Thanks a lot for your feedback.
> I think SqlAggFunction.getDistinctOptionality() is exactly what I
> am looking for.
>
> BTW, I think ANY_VALUE and SINGLE_VALUE also belong to the category of
> duplicate insensitive functions.
> What do you think?
>
> Best,
> Liya Fan
>
>
>
> On Tue, Oct 13, 2020 at 4:55 PM Julian Hyde <[email protected]> wrote:
>
>> We already have this concept. See SqlAggFunction.getDistinctOptionality(),
>> added in https://issues.apache.org/jira/browse/CALCITE-3159 <
>> https://issues.apache.org/jira/browse/CALCITE-3159>.
>>
>> Julian
>>
>>
>>> On Oct 13, 2020, at 12:54 AM, Fan Liya <[email protected]> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to introduce the idea of duplicate insensitive aggregate
>>> functions.
>>>
>>> For such functions, the aggregation results remain the same even after
>>> deduplication.
>>>
>>> For example, given a sequence of data {1, 1, 2, 2, 3, 5, 5}, the
>>> aggregation results of MIN are the same regardless of whether we perform
>>> data deduplication first. That is,
>>>
>>> MIN({1, 1, 2, 2, 3, 5, 5}) = MIN({1, 2, 3, 5})
>>>
>>> So MIN is a *deduplicate insensitive function*.
>>>
>>> On the other hand, function SUM is not duplicate insensitive, because
>>>
>>> SUM({1, 1, 2, 2, 3, 5, 5}) != SUM({1, 2, 3, 5})
>>>
>>> The concept of deduplicate insensitiveness can help us in many
>> optimization
>>> scenarios.
>>>
>>> For example, the curent implementation of AggregateMergeRule rules out
>> any
>>> aggregate calls for which the isDistict() method returns true. However,
>> for
>>> duplicate insensitive functions, the rule should be applicable.
>>>
>>> Could you please give your valuable feedback?
>>>
>>> Best,
>>> Liya Fan
>>
>>