> We are bucketing by date so we wil have max 32 buckets

If you do want to lookup specifically by date, you could just create day
partitions and never partition by month.

FYI, in a modern version of Hive

select count(1) from table where YEAR(dt) = 2016 and MONTH(dt) = 12

does prune it on the client side.

On a different note, 31 buckets is a bad idea (32 is ok), because for
String hashes (32-1) is the magic number which hurts "yyyymmdd" and 50% of
your buckets have 0 data.

http://www.slideshare.net/t3rmin4t0r/data-organization-hive-meetup/6


Use that as a number and you'll get the same number back as the hashcode,
so it won't be stable as months change (20160816 % 32 == 16 and 20160716 %
32 == 12).

The only way to have buckets correspond to a day_of_month as an int and
bucket on it with 32 - then bucket0 == 31, bucket1=1, bucket2=2 etc.

Cheers,
Gopal


Reply via email to