Hi,

I agree that for the windowing function, in the absence of any value the
sum shall be NULL, not 0.

Cheers,

Pau.

Missatge de Stamatis Zampetakis <zabe...@gmail.com> del dia dt., 22 de nov.
2022 a les 11:26:

> Hello,
>
> Regarding the inconsistency you describe in the window function, indeed it
> seems to be a bug. However, I would double-check with the SQL standard
> to be sure there is no intentional deviation and/or test the query in
> different DBMS.
>
> As far as it concerns the behavior of the aggregate function SUM on
> string/varchar types the SQL standard forbids this operation (small extract
> below).
>
> 10.9 <aggregate function>
>
> Syntax Rules
> 5g) If SUM or AVG is specified, then:
> i) DT shall be a numeric type or an interval type.
>
> General Rules
> 6)d)v) If SUM is specified, then the result is the sum of the values in
> TXA. If the sum is not within the
> range of the declared type of the result, then an exception condition is
> raised: data exception — numeric value out of range.
>
> As you observed, Postgres is inline with the standard and forbids this
> operation but this is not the case for every DBMS. Note that Hive is closer
> to MySQL than it is to Postgres so in many cases it makes sense to use it
> as a reference.
> Below, I outline the results on 8.0.27 MySQL Community Server.
>
> select sum('a') from tblstrcol;
> +----------+
> | sum('a') |
> +----------+
> |        0 |
> +----------+
>
> select sum('a') from tblstrcol where false;
> +----------+
> | sum('a') |
> +----------+
> |     NULL |
> +----------+
>
> When there are rows the result of SUM is zero, and NULL when the result
> set is empty thus I am a bit skeptical about changing the existing behavior.
>
> Best,
> Stamatis
>
>
> On Mon, Nov 21, 2022 at 3:53 PM Stephen Carlin <scar...@cloudera.com>
> wrote:
>
>> Wanted to throw this one out for discussion for a bug I found  and how to
>> fix it...
>>
>> So we are inconsistent with how we handle sum() on windowing functions.
>> If all the rows are null and the rows are all on "preceding" rows, we
>> return NULL.  On "following" rows, however, if all the rows are null, we
>> return 0.  This is inconsistent and I have a fix for that so that we always
>> return null.  The fix I have is here (not yet reviewed):
>> https://github.com/apache/hive/pull/3789
>>
>> My discussion though lies in a different problem which you can see in the
>> patch I uploaded.  My current fix changes behavior of the following
>> statement:  "select sum('a') from my_table".  If my_table has rows, right
>> now we are return 0.0.
>>
>> I've looked on postgres and it doesn't even allow a sum on a string
>> column so I can't really compare to that database.  My current fix doesn't
>> disable this, but it does change the behavior to return NULL on this select.
>>
>>
>> I kinda feel that returning NULL is more correct than return 0, but I
>> wanted to throw this out there to see what y'all think.  This would be a
>> change in behavior and that makes me nervous.
>>
>> Thanks!
>>
>

-- 
----------------------------------
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
----------------------------------

Reply via email to