Thanks for confirming that I wasn't imagining this. Turns out this was a known bug (HIVE-2955 https://issues.apache.org/jira/browse/HIVE-2955 ) and is already fixed in version 0.10.0
Robin On 4/22/13 11:25 AM, "Mike Liddell" <mike.lidd...@microsoft.com> wrote: >Robin, >I was interested so tried a quick repro. Yes I see the behavior too. > >Environment: hive-0.9.0 on windows. >Hive> CREATE TABLE ptable(val int) PARTITIONED BY(year int, month int, >day int); >Hive> LOAD DATA LOCAL INPATH d:/data/2013-01-01.data.txt INTO TABLE >ptable PARTITION(2013,1,1); >// the txt file has separate lines with values 1,2,3,4. >Hive> select max(day) from ptable; >.. >OK >NULL >Hive> select avg(day) from ptable; >OK >2.5 > >Other normal queries work fine. I didn't try other aggregations. > >-mike. >-----Original Message----- >From: Robin Morris [mailto:r...@baynote.com] >Sent: Thursday, April 18, 2013 1:15 PM >To: user@hive.apache.org >Subject: max() returns NULL on partition column > >partitioned_table is partitioned on year, month, day. > >A query of the form >> select max(day) from partitioned_table where year=2013 and month=4; >spins up zero mappers, one reducer, and returns NULL > >Hadoop job information for Stage-1: number of mappers: 0; number of >reducers: 1 >2013-04-18 12:35:57,842 Stage-1 map = 0%, reduce = 0% >2013-04-18 12:36:00,855 Stage-1 map = 0%, reduce = 100%, Cumulative CPU >1.36 sec >2013-04-18 12:36:01,860 Stage-1 map = 0%, reduce = 100%, Cumulative CPU >1.36 sec >2013-04-18 12:36:02,865 Stage-1 map = 100%, reduce = 100%, Cumulative CPU >1.36 sec >MapReduce Total cumulative CPU time: 1 seconds 360 msec Ended Job = >job_201302011159_205031 MapReduce Jobs Launched: >Job 0: Reduce: 1 Accumulative CPU: 1.36 sec HDFS Read: 0 HDFS Write: 3 >SUCESS >Total MapReduce CPU Time Spent: 1 seconds 360 msec OK NULL Time taken: >8.351 seconds > > > >Trying to work round it by doing something like select max(day) from >(select day from partitioned_table where year=2013 and month=4)foo; >doesn't work either, I'm guessing because the optimizer pushes down the >max() into the subquery, resulting in the same query as above. > > >Using an explicit intermediate table does work >> create table foo_max as select day from partitioned_table where >>year=2013 and month=4; select max(day) from foo_max; drop table >>foo_max; >Several map-reduce jobs later, the correct answer (of 18) is given. > > >distinct() has similar behavior, except that rather than returning NULL, >it returns nothing. >min() returns NULL >avg() works! > > >If someone else can reproduce this, I'll submit a bug. > >Robin > > > >