>               cast(NULL as bigint) as malone_id,
>               cast(NULL as bigint) as zpid,

I ran this on master (with text vectorization off) and I get

20170626        123     NULL    NULL    10

However, I think the backtracking for the columns is broken, somewhere - where 
both the nulls end up being represented by 1 column & that I think breaks text 
vectorization somewhere.

> Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["sum(VALUE._col0)"],keys:20170626,
>  123, KEY._col2, KEY._col2

See the repetition of _col2, while output has a _col3 (and _col4 is the 
aggregate result).

Hive-1.2 has similar issues (which I assume 2.1.0 has too).

                    Group By Operator
                      aggregations: sum(COALESCE(10,0))
                      keys: 20170626 (type: int), 123 (type: int), null (type: 
bigint), null (type: bigint)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4
                      Statistics: Num rows: 1 Data size: 32 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: 20170626 (type: int), 123 (type: int), 
_col3 (type: bigint)
                        sort order: +++
                        Map-reduce partition columns: 20170626 (type: int), 123 
(type: int), _col3 (type: bigint)
                        Statistics: Num rows: 1 Data size: 32 Basic stats: 
COMPLETE Column stats: COMPLETE
                        value expressions: _col3 (type: bigint)

_col4 should've been the value expression, not _col3 and _col2 should've been 
in the key expression + partition columns (because you're grouping by 3 
columns).

> what do you think? is it me? or is it hive?

Definitely Hive.

If you file a JIRA, please run against a 1-row ORC table and report the 
vectorization issue too.

A performant fix to the problem would be to fix this similarly to how I'm 
trying to fix views with PTF + filters (i.e the filter injects a constant into 
a window function).

https://issues.apache.org/jira/browse/HIVE-16541

Doing the same with the GroupBy would prevent constants from showing up in a 
group-by like this.

These can happen because of good engineering too, you don't end up writing a 
group-by with a "cast(null as bigint)" - you write a view with a groupby and 
then call it with a "where zpid is null and malone_id is null".

Cheers,
Gopal






Reply via email to