> cast(NULL as bigint) as malone_id, > cast(NULL as bigint) as zpid,
I ran this on master (with text vectorization off) and I get 20170626 123 NULL NULL 10 However, I think the backtracking for the columns is broken, somewhere - where both the nulls end up being represented by 1 column & that I think breaks text vectorization somewhere. > Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["sum(VALUE._col0)"],keys:20170626, > 123, KEY._col2, KEY._col2 See the repetition of _col2, while output has a _col3 (and _col4 is the aggregate result). Hive-1.2 has similar issues (which I assume 2.1.0 has too). Group By Operator aggregations: sum(COALESCE(10,0)) keys: 20170626 (type: int), 123 (type: int), null (type: bigint), null (type: bigint) mode: hash outputColumnNames: _col0, _col1, _col2, _col3, _col4 Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: 20170626 (type: int), 123 (type: int), _col3 (type: bigint) sort order: +++ Map-reduce partition columns: 20170626 (type: int), 123 (type: int), _col3 (type: bigint) Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col3 (type: bigint) _col4 should've been the value expression, not _col3 and _col2 should've been in the key expression + partition columns (because you're grouping by 3 columns). > what do you think? is it me? or is it hive? Definitely Hive. If you file a JIRA, please run against a 1-row ORC table and report the vectorization issue too. A performant fix to the problem would be to fix this similarly to how I'm trying to fix views with PTF + filters (i.e the filter injects a constant into a window function). https://issues.apache.org/jira/browse/HIVE-16541 Doing the same with the GroupBy would prevent constants from showing up in a group-by like this. These can happen because of good engineering too, you don't end up writing a group-by with a "cast(null as bigint)" - you write a view with a groupby and then call it with a "where zpid is null and malone_id is null". Cheers, Gopal