Hi, I agree that for the windowing function, in the absence of any value the sum shall be NULL, not 0.
Cheers, Pau. Missatge de Stamatis Zampetakis <zabe...@gmail.com> del dia dt., 22 de nov. 2022 a les 11:26: > Hello, > > Regarding the inconsistency you describe in the window function, indeed it > seems to be a bug. However, I would double-check with the SQL standard > to be sure there is no intentional deviation and/or test the query in > different DBMS. > > As far as it concerns the behavior of the aggregate function SUM on > string/varchar types the SQL standard forbids this operation (small extract > below). > > 10.9 <aggregate function> > > Syntax Rules > 5g) If SUM or AVG is specified, then: > i) DT shall be a numeric type or an interval type. > > General Rules > 6)d)v) If SUM is specified, then the result is the sum of the values in > TXA. If the sum is not within the > range of the declared type of the result, then an exception condition is > raised: data exception — numeric value out of range. > > As you observed, Postgres is inline with the standard and forbids this > operation but this is not the case for every DBMS. Note that Hive is closer > to MySQL than it is to Postgres so in many cases it makes sense to use it > as a reference. > Below, I outline the results on 8.0.27 MySQL Community Server. > > select sum('a') from tblstrcol; > +----------+ > | sum('a') | > +----------+ > | 0 | > +----------+ > > select sum('a') from tblstrcol where false; > +----------+ > | sum('a') | > +----------+ > | NULL | > +----------+ > > When there are rows the result of SUM is zero, and NULL when the result > set is empty thus I am a bit skeptical about changing the existing behavior. > > Best, > Stamatis > > > On Mon, Nov 21, 2022 at 3:53 PM Stephen Carlin <scar...@cloudera.com> > wrote: > >> Wanted to throw this one out for discussion for a bug I found and how to >> fix it... >> >> So we are inconsistent with how we handle sum() on windowing functions. >> If all the rows are null and the rows are all on "preceding" rows, we >> return NULL. On "following" rows, however, if all the rows are null, we >> return 0. This is inconsistent and I have a fix for that so that we always >> return null. The fix I have is here (not yet reviewed): >> https://github.com/apache/hive/pull/3789 >> >> My discussion though lies in a different problem which you can see in the >> patch I uploaded. My current fix changes behavior of the following >> statement: "select sum('a') from my_table". If my_table has rows, right >> now we are return 0.0. >> >> I've looked on postgres and it doesn't even allow a sum on a string >> column so I can't really compare to that database. My current fix doesn't >> disable this, but it does change the behavior to return NULL on this select. >> >> >> I kinda feel that returning NULL is more correct than return 0, but I >> wanted to throw this out there to see what y'all think. This would be a >> change in behavior and that makes me nervous. >> >> Thanks! >> > -- ---------------------------------- Pau Tallada Crespí Departament de Serveis Port d'Informació Científica (PIC) Tel: +34 93 170 2729 ----------------------------------